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SECTION A WHY DIGITAL? 
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1.1. INTRODUCTION. 


* It is pertinent at this early stage of the course to be sure of the answer to the question so often 
asked, "Why is it necessary to go digital ?". Indeed, why are computers and digits so manifest in this 
day and age? There are a range of stock (some would say obvious?) answers to this question. They аге | 
of particular significance as the television industry is only now beginning to accept the universal 
application of digital audio and video signals and systems. At this time some 70-80% of the industry 
uses primarily analogue techniques in terms of the equipment and systems that are in use. That is to 
say, studio installations and network centres, along with their associated transmission lines and links, 
are dominated by analogue signals and the relevant measurement and appraisal techniques. For most 
technicians and engineers working in the industry, their main experience has been necessarily on 
analogue systems. Video technicians go one step further inasmuch as most of their understanding of 
analogue video theory is centred on composite coding methods 


* The analogue based professional television engineer need not feel disadvantaged however, 
because it is not possible to posses an in depth understanding of digital audio and video systems 
without a thorough knowledge of the analogue forms from which the digital versions are derived. 
This is particularly true for the digitisation of composite video signals in the PAL and NTSC 
domains. Nonetheless if you are not familiar with analogue television then you may find your 
understanding of the main thrust of this course somewhat inhibited. This first session is designed to at 
least revise some of the more important aspects of the analogue video approach. 


* It is pertinent to commence the discussion with the question so often asked. 'Why is better to 
use digital signal processing?' 


Some principal reasons frequently given are:- 
i) A well designed digital system is immune to noise. 
ii) A well designed digital system is immune to amplitude non linearity. 


lil) The domination of cheap readily available mass produced ICs means that digital 
technology is simple to implement when compared with analogue methods. 


* These arguments are somewhat misconceived since they do not support a true digital system. 
Thev describe the single binary classification that has only two states. A digital system is not 
exclusively twofold. Digital means many states. the only limiting factor being that a finite number of 
values is defined for the system under discussion. If a process uses a multiple level code, most of these 
arguments lose a degree of validity. 


5 As an example, we may consider the Digital Sound In Syncs (0515) system which is used 
extensively by various television networks in the UK. This is a method of carrying NICAM Stereo | 
Sound as a digital signal inside the line sync pulse duration of 4.7us.(625/50) This enables the audio | / 
data to pass along а common path, being contained within the video signal. Restrictions imposed by 

the video bandwidth and the limited time available within the line sync pulse, means that in order to 

carry two high quality audio channels, the coding method employed cannot be binary. The problem is 

solved by using a quaternary (four level) code. Therefore, for this digital system it can be said that:- 


(a) It is not immune to noise. 
(b) It is not immune to amplitude non-linearity. 
(c) Standard ICs are not readily available to implement quaternary signal processing. 


* It would appear that the reasons behind the desire to use digital systems for audio and video 
signal manipulation are more subtle than the classic answers given above. 


* Let us therefore reconsider the question. There is no doubt that for most applications in 
communications electronics, analogue methods must give way to digital implementation eventually. 

Yes, there are a number of strong reasons for the domination of digital technology. In the first place 

although the rather pedantic contradictions put forward above are fundamentally correct, in point of 

fact, they do not carry a great deal of weight. This is because any multi-level digital system can, with 

relative ease, be converted to a binary mode by the process of Pulse Code Modulation (PCM) to be 

discussed later in this chapter. Bearing this in mind then, it follows that some reasons for digital 7 
technological dominance аге:- | | 


1. The availability of cheap binary processors and computers etc. means that any other 
approach is likely to be too costly. 


* Digital techniques are now applied in practically all aspects of our every day experience. The 
television programme that demonstrates new technological developments using the perpetual phrase 
"__----and is controlled by a computer" is tediously familiar to us all! This effectively dictates the 
binary digital approach as the first option for the designer, even if it involves what appears at first 
sight to be a laborious transformation into the binary domain in order to implement the task. 


2).. Binary digital processing represents an unavoidable standard ‘platform' for all 
branches of electronics. 


* To appreciate this, one should realise that historically, analogue electronics tended to 
produce specific components for a particular application. Vision and sound are two classic examples. 
Specialist circuit techniques were needed to develop video systems. Design engineers tended to 
specialise in video or audio circuit techniques selecting particular components and devices to solve 
problems that they encountered. One only has to consult a transistor data catalogue to see the truth of 
this. 


= 


* The modern approach differs inasmuch as design engineers now write audio and video 
software. In fact they can write software to solve almost any problem. The rapid evolution of 
'Platform' ог "Хоп Linear Editing' demonstrates this. What has been the exclusive domain of the 
television and sound technologist is about to be engulfed by the computer industry. Within a few 
years. specialist television analogue and digital electronics as it is currently understood may well not 
exist.. Digital compressed platform video and audio systems will use standard computer software . 
i.e.. common operating systems, computer files, file manipulation. disk storage etc. Sophisticated 
compression techniques mean that video and audio links will be able to use standard computer 
highways. Compression techniques and standards are discussed in the later in the course. 


3) The application of cheap binary digital electronics means that almost any idea or 
concept can be demonstrated. | 


This is a most significant argument, and is confirmed by the incredible picture manipulation 
and synthesis now available to programme makers . Hypotheses relating to complex image 
manipulation etc., can be implemented even though ‘state of the аг! technology may not be available 
to produce an economically viable product. In 1988, BBC demonstrated their experimental results on 
Phase Correlation Motion Estimation (Ph.C), being a new approach to scanning standards conversion. 
At the time. the process was applied to about one quarter of a monochrome picture area to prove the 
hypothesis. Although the process utilised several main frame computers, there was still insufficient 
mathematical power available to process the whole frame. No matter, the principle was seen to work. 
Some four to five years later a major equipment manufacturer produced a relatively small (but costly) 
unit to perform the same process on the whole picture area. It is apparent then, that one of the biggest 
advantages of binary digital electronics when applied to television is that practically anything is 


~ conceptually possible given sufficient computing power. 


The recent Super High Definition TV 35 mm negative film quality digital video post 


| 5 production technique is a further case in question. 35mm rushes are transferred to video а! some 40 
2% MBytes per frame (Тор End 601 4.2.2' digital video uses about 1 MByte per frame’). Electronic post 
. production is then applied using superior electronic keying exploiting the electronic image 


manipulation available in this domain. (The idea of attempting a four page turn as an optical effect 
and still maintain the original negative quality is interesting to contemplate!). The result is then 
printed back onto film stock. This is another staggering demonstration of the enormous potential of 
digital techniques. 


A further example. 


Some years ago the audio fraternity declared that digital audio would only be of value if it 
were possible to edit in the same physical manner as analogue ‘open reel’ audio tape. Subsequently, a 
family of stationary head digital audio tape machines was developed under the mnemonic 'DASH', 
(Digital Audio Stationary Head). The М " version was designed to mimic its analogue predecessor, 
enabling physical cut and splice editing with manual spool jogging etc., Unfortunately the device 
proved to be of short term value, audio operators quickly realising that digital processes offered much 
more flexibility if a completely new approach was accepted. A modern digital audio work station 
using disk storage gives massive post production power to the audio operator through an interactive 
menu display, implemented of course, via the inevitable computer. Consequently, most 1⁄4 " DASH 
machines stand idle, collecting dust! The point is, that such digital audio techniques were 
conceptually feasible many years before economically viable systems became available, they were after 
all, already used for video post production. The added problem here was the reluctance of the 
profession to accept a new approach. Some would say that the development of DASH was a waste of 
resource since it was inevitable that it had no long or medium term future. Maybe it would have been 
more tactical to have concentrated on disk storage, (which was the case with video systems, since 
there was no history of cut and splice editing to blur the objective), and to have skipped the DASH 
episode altogether? 


5 Having accepted that digital application is inevitable. there poses the question of digital 

standards and how they are to be monitored and maintained. There is a common belief postulated by 

many. that the monitoring and measurement of digital performance is of little value. This is perhaps 

true if traditional analogue testing and monitoring methods are to be the evaluation tool. Undoubtedly 

it is difficult to see exactly what the merit of such measurements would be for digital signals. 

Nonetheless it is certain that experience will show that some measurement is required. albeit of a 

different form. Experience will also promote an intuitive response to these measurements just as it did 

with analogue operations. This cannot happen, unless the operator/engineer is aware of the state of ав. 
the system in use; therefore monitoring the condition of a digital system is essential if professional У 
standards are to be maintained. Possibly the main reason for the popular negative view arises from 

ignorance and, dare I say fear, of a system which is difficult to interpret. After all. having had so 

much experience of oscilloscope/waveform monitor displays, which give a clear indication as to the 

nature of the faults in the system and often their cause, digital systems seem to be invisible. This is 

merely because years of consolidated knowledge and experience, gained by observing and 

manipulating the old technology with sophisticated and proven test equipment, is not available for 

digital operations. However as the user becomes more familiar with the subject, the problems that 

inevitably must arise will be better understood. This will enable digital systems to function in a more 

dependable fashion as a result of objective awareness of the system's performance parameters. 


SECTION B ANALOGUE VIDEO THEORY 


ЭРА LLIN DOO eee eee 


NOTE 

The discussion will commence with a brief recap of the main aspects of the analogue video signal 
structure. Additional material will be used during this early part of the lecture and most of the 
diagrams to be used are reproduced at the end of these notes. Other material may be made available 
at the session, but по pre prepared formal notes are provided on these basic topics. What follows is a 
comprehensive documentation of the core material for later reference purposes it is not a transcript 


of the lecture. у N 
) 
1. Summary Recap of the PAL Coding Process 


PAL was conceived as an improved version of NTSC. Both processes attempted to cram three pints 
into a pint pot!! 


RGB signals are matrixed to form the Y, R-Y, and B-Y signals. 


The B-Y signal is band limited to 1.3 MHz and weighted to become the U signal. It is then 
Suppressed Carrier Amplitude Modulated as the U RF signal onto the U axis as seen on a Vectorscope 
display. 


The R-Y signal is band limited to 1.3 MHz and weighted to become the V signal. It is then 
Suppressed Carrier Amplitude Modulated onto an axis at 90 degrees (in quadrature) to the U RF 
signal to define the V axis. These two RF signals are added in quadrature, defined as Quadrature 
Amplitude Modulation (QAM). The V axis is switched on alternate lines (Phase Alternate Line) to 
alleviate the differential phase problems that arose in the original NTSC approach. The resultant 
chroma signal is "piggy backed" onto the luminance signal to form the PAL composite colour video 
signal. 
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| The PAL encoding and decoding process 


The sub-carrier is chosen to be an odd multiple of 1/4 line frequency such that chroma energy may be 
concentrated between the line harmonics where the luminance energy is minimal. It is chosen to be at 
the upper end of the luminance spectrum but low enough to accommodate the upper side band of the 
QAM chroma signal. Thus:- 


PAL 
f = (284 - 1/4) fy +fy/2 Hz 
5С 
where:- 
fg = Line frequency Hz 
fy = Field frequency Hz 


Some other considerations on choice of sub carrier are:- 
i). Interference between luminance and chrominance (Y/C crosstalk) is kept to a minimum. 


1). Under most normal circumstances energy at the upper end of the luminance spectrum will be 
small thus supporting (a) above. 


iii). The resultant dot pattern exhibited on a monochrome receiver will appear as a fine mesh 
pattern near the limit of resolution and must not be objectionable to the viewer. 
Essentially, a total 8.1 MHz information band width comprising the luminance and 
chrominance is analogue compressed within the 5.5 MHz base band single data link by a 
process of frequency interleaving. 


2. PAL Impairments:- 
(1) Dot Pattern ==, 


The PAL signal should when applied to monochrome monitors yield pictures with minimal 
impairment. This was a major design consideration (Monochrome Compatibility). 


In fact, by modern standards, monochrome receivers/monitors display objectionable dot 
patterns on highly saturated colour areas. 


(11) Luminance/Chrominance Separation 


The luminance signal is difficult to recover and indeed it has been common practice not to 
bother with Y/C separation. Ignorance of this limitation can lead to major problems in 
multiple coding paths e.g. RGB to PAL to RGB to-PAL etc., or more recently YUV to PAL - 
YUV to PAL etc. when operating in the PAL/Component hybrid domain. 1.е. we tend to 
interpret the ‘luminance signal as the Y + Ch (as with monochrome displays) for other signal 
processing applications, this can lead to major complications. 


(iii) 
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The most common method to overcome (ii) above is to incorporate a chroma notch filter in 
the luminance channel. This is normally a uni-dimensional analogue band rejection filter, it 
destroys horizontal luminance resolution in the F S/C region (4.43 MHz PAL). All colour 
receivers require this filter to eliminate colour desaturation that would arise without the 
above technique, with the resultant loss of horizontal resolution. Modern adaptive comb 
filter encoders and decoders overcome this problem to some extent by cutting holes in the 
luminance signal spectrum to accommodate the chroma. This is still a compromise. 


Luminance/Chrominance (Y-Ch) Cross Talk 


It is common practice in a PAL decoder to use a unidimensional band pass filter to extract 
the chroma component concentrated about 4.43 MHz. Such a filter obviously will include 
any luminance energy contained within that region, this will be passed to the chroma circuits 
for demodulation back to baseband causing cross colour. The reverse of this has already 
highlighted, in as much as chroma information finds its way into the luminance processing 
circuits. This gives rise to Y/C cross talk impairments, one of which was described above. 
Moreover, the insertion of chroma information in between the line harmonics is based upon 
the assumption that this region contains no appreciable luminance energy. This is a false 
premise, the amount of luminance energy contained at odd multiples of 1/4 line frequency is 


` variable and will depend upon scene content, and in particular, on the amount of inter frame 


or inter field movement. 


Cons:ant Luminance Principle 


It is well established that the colour difference signals contain luminance information and 
thus the luminance signal as applied to monochrome receivers does not give true 
monochromatic rendition of the scene. This has been identified as the failure of constant 
luminance and is present in the PAL or NTSC systems. It is however, offset by the presence 
of the sub carrier wave form displayed on a monochrome receiver. This is a symmetrical 
phase modulated sine wave about the given luminance level, the gamma characteristic of the ` 
tube modifies this sine wave for increased brightness at lower excitation levels and to some 
extent compensates for loss of constant luminance. 
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Changes in brightness, due to non-linearity and 
sub-carrier dot pattern, of full amplitude saturated colours, as 
displayed on a monochrome television receiver 

(a) Values as transmitted - 
(b) As displayed without dot pattern. Gamma — 2-2 
(c) As displaved with dot pattern x 


(a) ù |l. © 
Y Y Y 
Transmitted Displayed Displayed 
| | without with sub-carrier 
sub-carrier 

White 10 
Yellow 0-91 
Cyan 0-69 
Green 0-52 
Magenta 0-32 
Red 0-254 
Blue 0-084 
Black 0 


Constant luminance failure 


(Principles of PAL Colour Television and Related Systems: H.V. Simms, Newnes Technical Books) 
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See? 
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(vi) 


(vii) 


3 (viii) 


mechanical transducer mechanisms involved. Again, the colour sub carrier reference is given at 


the 


(ix) 


Restricted Chroma Band Width 


The acuity of the eve to colour vision is not the same as that for monochrome. This allowed 
the reduction of the chroma band width to about 1.3 MHz. Thus the PAL system 

renders a luminance band width of 5.5 MHz and a band width for each of the colour 
differences signals of approximately 1.3 MHz. Again for transmission (i.e. viewing) | 
purposes. this has been a perfectly reasonable consideration. For chroma key or matte effects | 
it was it was Convenient to use the colour difference signal. usually the B-Y. This signal only 
had a nominal band width of 1.3 MHz. This gave difficulties in accurate keying on fine 
detail. Any proposed new system would require a greater band width for the colour 
information in order to facilitate convincing image processing such as chroma key. 


Further complications result from the restricted band width of the colour difference signals. 
The luminance component originates as a full band width signal, whereas the colour 
difference signals are restricted in band width. For example, attempts to recover the 
luminance signal from the colour difference signals would only be valid within the 1.3 MHz 
band. This limitation complicates the various methods used to extract detail information in 
camera systems. Older delegates may remember the difficult days of four tube cameras and 
Delta L correction. 


Differential Phase and Differential Gain Errors 


The reference sub carrier phase for one horizontal line of chroma information is given at the 
beginning of the line at blanking level by the colour burst. It refers to the phase of the RF 
chroma information along that line which has an average value of 0 volts. 1.6. pure AC. 
When this signal is piggy backed onto the luminance signal the original chroma information 
has an added DC. value dependent upon the brightness value. Hence the chroma is raised by 
a given luminance value that could approach 700mV. If the processing circuitry is such that 
variation of the instantaneous DC value of the signal that causes a change of phase and/or 
amplitude, then differential phase and/or differential gain errors arise. 


Time Base Errors 
Video tape recorders suffer from variations in head to tape velocity due to the electro 


beginning of the line and assumes that the time base progression to the next line will be 
linear and at the specified rate. In reality various time base error functions are always 
present. In particular, first order velocity errors will produce a left to right hue shift for 
NTSC recordings, or a saturation error is seen for in PAL unless the necessary time base 
correction precautions are taken. Whilst the Y signal and its associated display is tolerant of 
quite large timebase errors (cue VHS!), as would be true of any baseband signal such as RGB 
or YUV. For A pal signal a 226ns timebase error represents a 180 degree swing on the | 
Vectorscope display. There is also the fact that this error will vary drastically in a random 
manner prohibiting the sub carrier regenerator in the decoder from achieving any form of 
chroma lock. 


Noise Cancellation in Generating the Composite Signals 


In generating a PAL signal, adding proportions of RG and B signals to obtain a luminance 
signal yields a noise advantage in as much as the noise adds as the root of the sum of the 
squares. In the case of the cancellation effects designed into the colour difference signals to 
yield the primary voltages, noise however is not cancelled and continues to add as the root of 
the sums of the squares. Thus a zero colour difference signal would yield noise in as much 
as two equal but opposite polarity signals would cancel, but the noise components would add 
in the manner described above. Still true of course for the component system that is to be 
discussed. 
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(x) 8 Field Sequence 


The intracacies of this overworked topic seems to evade many people. It is. in fact. 
quite simple if we consider the following:- 


In the equation given above that defines the sub carrier frequency for PAL. the use of quarter 


line offset for frequency interleaving is specified i.e. 283 3⁄4 fH. Stated another way and 
ignoring the minor effect of the 25 Hz offset, it follows that there are literally 283 3⁄4 cycles 
of sub carrier to each line. Assuming that line 1 starts at the beginning of a positive half 
cycle, the system will need to scan four complete line periods before a subsequent line again 
commences with its sub carrier in the same phase with respect to the line sync, thus 
establishing a four line sequence. There are 625 lines in a frame, so there аге 156 1⁄4 such 
sequences per picture. Hence the next frame will start on line 2 of this + line sequence. This 
means that the sub carrier phase will appear to have shifted with respect to the line sync by 
90 degrees. After 2 pictures, 312 V 4 line sequences will have occurred and the third frame 


will commence at the beginning of the negative half cycle of the sub carrier waveform. After 


4 pictures, 625 four line sequences will have happened (surprise surprise!) 1.е., an integer 


number of 4 line groups will have taken place and frame 5 will recommence at the beginning 


of the next four line sequence. Since this is an engineering phenomenom and video 
engineers think in fields, not frames; they refer to this as an 8 field sequence. It is simply a 4 
frame sequence, fields look after themselves because there are always two fields per frame. 
Field 1 can never be mistaken for a field 2!. It would greatly assist operationally based 
technicians if this characteristic of the PAL signal was described as a four frame sequence. 


There is nothing astoundingly revealing about all this. It was well understood by Dr. Bruch 
when developing the PAL system. Also it gives rise to a 4 frame repeating dot pattern on a 
television display which causes the dot pattern to seem to crawl, yielding a a further ` 
impairment on monochrome receivers when displaying so called monochrome compatible 
PAL pictures. Dr Bruch did not however specify an absolute s/c to H phase relationship, that 
came later. 


Conclusion 


The list of PAL impairments goes on, no doubt delegates may have other points and further 
discussion should arise during the course. Let us not be too critical, the PAL system is ingenious, 
reliable, robust and works extremely well. It was designed as a terrestrial colour transmission system 
as a single data link based upon the economical use of bandwidth and cost effective production of the 
television receiver. For example, it yielded a major advantage over SECAM in that it could be easily 
processed particularly for mixing in a studio environment. As time moved on increasing pressure was 
imposed on the composite video signal systems to enact more complex operations in terms of image 
processing, most of which PAL, NTSC etc., were not originally designed to accommodate. Some 
time ago, the industry began to consider an alternative approach, this time concentrating on a method 
for image origination and manipulation purposes. Image aquisition there fore could not be conceived 
to originate as a composite signal. What was needed was a signal system that could be used to deliver 
images onto a recording medium in a form suitable for post production work. Application to 
domestic receivers was a secondary consideration; it was felt that transcoding to composite would 
only need to take place once at the transmitter. (How wrong they were!) Experience and improved 
technology were pointing in the direction of digital video. CCIR Recommendation 601 as an 
internationally agreed digital component standard was already in operation. The interim step of 
Component Analogue Video was forced into being via the notoriously successful Betacam and M2 
VTR analogue aquisition formats. 


ES 
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3, Component Coding 


. There is nothing very complex about component coding since the process is inherent in any 
NTSC. PAL or SECAM encoder. However. the approach is to use three separate signal paths 
handling separate base band signals, but instead of using red. green and blue video voltages. the 
matrixed Y, R-Y and B-Y signals are derived for interconnection. This has a number of implied 
advantages to be discussed in more detail later. The proposal therefore was to distribute signals via a 
3 wire process which usually take the form of 3 BNC connectors, although purpose designed multi 
pin co-axial connectors are available 


A contributing factor to this approach is that of digital video processing. It has been realised 
for sometime that the preferred approach to digital video signal distribution. is to work in the 
component domain using time division multiplex for the three signals. This is defined in the ITU-R 
601 specification. 


4. Advantages of the Component Approach 


As a result of the previous discussion on impairments of the PAL system, it will be seen from 
what follows, that a number of major advantages become apparent if we operate and process video in 
the component domain. The list below is in no specific order but highlights some of these major 
advantages. 


(1) Absence of Dot Pattern 


There is no sub carrier for the chroma signals and therefore there is no dot pattern 
present when the Y signal is being used for monochrome display or processing. This 
means that there can be no dot crawl either. | 


(ii) Continuous Band Luminance Channel Working 


Since no sub carrier exists, it is not necessary to use notch filters with a subsequent 
loss of horizontal resolution, giving a considerable improvement and the elimination of a 
major PAL impairment. 


(ш) Minimal Y-CH Cross Talk Problem 


Since no frequency interleaving takes place, there can be no cross talk problems due 
to Y-CH interference in the chroma band. The term minimal is used here because a 
component system will still need careful design in terms of the usual distribution crosstalk 
problems that arise with adjacent channels, amplifiers etc. 


(iv) No 8 Field Sequence 


If we are working solely in the component domain then no 8 field sequence exists 
also there is no sub carrier to horizontal sync relationship to worry about. This must be 
considered a major operational advantage. However, it seems still be necessary to identify 
an pseudo 8 field or 4 frame sequence for the foreseeable future because of the hybrid 
working that happens in television programme making operations. 
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(v) No Differential Phase and Gain Problems 


Because we are not in the business of "piggy backing" chroma information on to а 
luminance signal. phase and gain errors in this context do not arise. 


(vi) Relatively Simple to Digitise 


This particular aspect will be discussed later but composite signals are complex to 
digitise. It becomes much easier to digitise components and Recommendation 601 points in 
that direction. 


(vil) Compatibility Between Standards 


All European countries use 625/50 line twin interlaced scanning systems. Since 
SECAM and PAL both operate from the modulation of R-Y and B-Y signals, then total 
compatibility will exist at component level amongst all European countries. The 525 line 
standard has a similar active line time to that of the 625 line standard and therefore line time 
compatibility exists in that dimension also. 


(viii) Direct Colour Replay from VTR Without Timebase Correction 


A colour replay from direct demodulator output would be available in component 
form such that no time base correction was required for colour monitoring. This proved 
useful for the provision of colour replay direct from camera on a location shoot "Colour 
under" processing such as that encountered in U-matic VTR's previously used for ENG 
would no longer apply. 


Conclusion 


The above advantages were documented on the spot over a fairly brief period. 
Probably, other aspects or advantages will arise from discussion. The above is considered to be a 
fairly convincing list, it predicts that all signal distribution and origination in professional television 
must eventually move to the component domain. There are, however, some disadvantages to be 
identified when working with component signals. 


5, Disadvantages of CAV 


(i) Increased physical complexity 


The concept of three individual signals in a parallel configuration implies circuit 
complexity and level balancing difficulties. e.g. installations use easily available Y EDA's 
etc. in triplicate. This is expensive and unwieldy. However this has improved as purpose 
designed units have become available. Nonetheless very few CAV routers exist, the cost 
implications are astronomical. 
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(11) Three Wire Connection 


Three simultaneous channels must be available or three connecting cables to 
monitoring equipment etc. 


(iil) Increased Band Width 


If the basic PAL chroma band width is to be matched then we will require 5.5 MHz 
for the Y channel - 1.3 MHz each for the colour difference channels. i.e. 8.1 MHz total 
band width capacity at least. Component working implies a desire to improve this further for 
reasons explained later. 


(iv) Constant Luminance and Monochrome Compatibility 


The principle of constant luminance is not obeyed to an even lesser extent than 
when composite working. However the assumption that CAV is monochrome compatible is 
completely erroneous. No such mandate exists 


(v) Absolute signal level required 


Maintenance of accurate levels is critical. In PAL poor maintenance of chroma 
levels yields incorrect saturation which is tolerable. However inaccurate alignment of 
component levels implies hue shifts. 


Conclusion 


There are very few disadvantages if considered for image aquisition and post 


production. It is good to see that the familiar argument for monochrome compatibility at long last 
| “ доез not seem to restrict the progress of colour image rendition! Other than Camerapersons, who 
с seriously watches black and white tele anyway? 


6. Component Video Standards 


(i) Colour Bar Signals 


The definition of colour bars remains unchanged since the PAL colour bar 
waveforms are derived from the fundamental GBR domain. It is important however to learn 
to recognise the form of the RGB, B-Y & R-Y wave forms for any colour bar definition. 1.6. 
100%, 95%, EBU & 75% bars. 


(ii) Under normalised amplitude conditions:- 
G = B = R = 700mV peak 


This defines the unity (100%) amplitude GBR domain. From this is derived the Y, 
R-Y & B-Y signals for the 100% saturation. 100% amplitude colour bar waveform. 


The G B and R signals are then matrixed to yield the Y. R-Y and B-Y signals. R-Y 
and B-Y are defined as the colour difference signals. Remember that the third colour 
difference signal (G-Y) is discarded. 
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Тһе EBU Document N10 defines levels for the component waveforms to yield the 
weighted colour difference signals thus:- 


=E ‘cr = 0.713 (E Ч -E x) 


=E "ев = 0.564 (E 'BU E x) 


This is the true CAV domain. The normal EBU N10 level CAV signals are 
generally referred to as :- 


Y, Св, Св 


(ili) The Question of Levels 


In accordance with the above definitions, Document N10 says that:- 
У = 700m VP-P 
Св = 700m V P-P 
CR = 700m V P-P 


No tolerances are specified but since it is normal to aim at G = 700 + 20m V, with 
and B and R to within + 5 mV of G, in the RGB domain, it would seem reasonable to use 
the same tolerances. i.e.:- 


Cg + 5m V w.r.t. Y 
Св + 5m V w.r.t. Y 


If EBU, 95% BBC or 75% bars are to be considered then the above CR and Ср levels 
would drop to 525m V P-P. Sony used 75% bars as their starting point. i.e. The B-Y and 
R-Y signals derived from 7596 bars were normalised to 700 mV P to P, giving different 
weighting factors to those use for N10. This early decision by Sony was based upon the 
dubious supposition that nobody used 10096 bars!! Also the original Betacam format was 
intended to deliver composite signals for the interconnection of equipment. There was only 
one Video In BNC socket on the back panel of the BVW 40 Standard Betacam VTR. The 
original Sony levels are now obsolete. 


7 | Correct Terminology 


Much confusion and bad terminology is used to describe the component domain. The 
delegate must judge for her/his self as to what terminology to use. In making that decision, the 
following observations should be heeded. 
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The terms in common use are:- 


(a. Y. R-Y. B-Y. 


This is a common label used to describe N10 component signals. It is an accepted 
title but is not strictly correct in as much as raw unweighted colour difference signals are not 
used for component distribution. For 100% colour bars the colour component amplitudes 
would be :- 


B-Y = 1.246V P to P 
R-Y = 0.98V P to P 


These are not М10 levels. 


b). У. E'm Е св 


This is strictly correct as defined by the EBU in Document N10, But rather a 
mouthful. It is hardly ever used, except in formal documentation. 


c) Y. CB, CR 


The original CCIR Document Recommendation 601 defines these symbols to 
describe the component digital levels. Mathematically:- 


E ‘CB = Св = 700 mV, and Е CR = Св = 700mV 


Therefore many people tend to use this nonclementure as a shortened version that rolls fairly 
easily off the tongue. Since the defined weighting factors equate, the terminology is 
acceptable. It is favoured by the author and 15 generally seen to be a very satisfactory way to 
accurately describe the CAV signals verbally without confusion. 


d). Y; U, М. 


It is unfortunately common to describe the N10 component signals in this way. 
Indeed many equipment manufacturers use this terminology, (YUV generators etc.). It is 
strictly incorrect and should not be used. If U and V signals are sought for 100% colour bars, 
then the following voltages should be delivered. 


U=0.614V P to P 
V —0. 86V P to P 


Neither are these N10 levels. 


The symbols U and V are used to refer to the weighted PAL baseband signals. In 
fact there is frequent debate about the definition of U and V in the PAL domain. It is often 
argued that U and V signals describe the RF modulated envelopes on the defined U and V 
Vectorscope axes. EBU documentation is quite clear on this however. One could argue that 
in France, since they are not PAL oriented, they would describe their component signal paths 
as Y, Dg, Dp. these being the SECAM components. Most engineers would consider this 
ridiculous because Dz and D, signals are defined as 2 Volts P-P!! 


e) Y. Pb. Pr 


This an attempt by УТЕ manufacturers to define a unique terminology for the C AV 
domain. It tends however to suggest the playback outputs from the VTR which indeed it is. 
They need not necessarily be N10 levels! Neither are CAV signals exclusive to video 
recorders, consider a camera and a signal router for example. It is in common use amongst 
system engineers and is often found on jackfields. They mean Y. Cp. CR of course. which is 
becoming the most favoured terminology. ы 


f) Ch A. ChB. САС orChl, Ch2. Ch3 


Used by test equipment manufacturers. they follow the YUV order and account for 
the modern preference to refer to GBR rather that RGB. The following table is a summary of 
the preceding discussion. 


R-Y Colour difference signals 
Е св Eeg CAV N10 Components 
шиш 
Test Equipment etc. 
Syncs on Ch 1 
Syncs on Ch A 
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8 The Question of Band Width 


No specific statements have been made about the required band widths for the 
colour difference signals in the analogue component domain. There are a number of possibilities that 
are worth considering. 


1). PAL Band widths 


Some may argue that if you are using CAV VTRs as islands in a composite | 
environment, then the band width specification for the Сұ and Св signals should be restricted 
to that required by PAL. i.e *»proximately 1.3 Mhz. | 


`= e 
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li) Sony Betacam Band widths 


Others may argue that since Sony seem to be the main originators of the component 
analogue domain. (unintentionally. They designed an ENG camcorder system for light 
weight location use as the first dedicated ENG VTR format to operate in the composite 
domain). then we need only worry about designing the system to accommodate the Betacam 
chroma bandwidths. The trouble is that it keeps changing The chroma band width in the 
early Sony Betacam systems was approximately 1.5 MHz. It is now claimed to be in excess of 
2.0 MHz. 


Ш). Full Band Width 


For signal origination and distribution in studio complexes it is necessary to 
accommodate full band width for each of the three channels. i.e. in excess of 5.5 MHz for 
the Cp Cp channels also. The colour component signals must be treated in the same way as 
the luminance signal If the chroma channels are seen to be less demanding that the 
luminance channel then ad hoc inclusion of limited pass band devices (by the use of inferior 
grade cable for the chroma channels for example) , would introduce variations in path 
propagation times with the consequent timing errors and Y/C mis-registration. Y/C timing 
measurements for CAV is crucial to ensure good quality results 


iv). ITU-R Recommendation 601 


ITU-R Recommendation 601 is the document that specifies the parameters for the 
digitisation of component video signals. This has implications in terms of required band 
widths for analogue component use. Recommendation 601 is examined in some detail later 
in the course. It basically implies that the luminance can operate at a bandwidth a little 
greater than 5.5 MHz and the chroma channels will operate at various sub multiples of this 
figure dependent upon the mode that the system is using. The ITU-R Specification implies 
that the chroma bandwidths can vary over the extensible family of digital standards that are 
601, particularly in a digital facility house. 


Conclusion 


The three component paths must be matched in terms of bandwidth and propagation 
time. i.e. cable lengths. if this criteria is not obeyed severe problems can be expected! 


© JOHN LISNEY 
SEPTEMBER 1994 
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Semi-circvlzz rotary camera shutter 
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DIGITAL VIDEO CONVERSION 
1. Introduction to conversion 


There are a number of ways in which a video waveform can be 
digitally represented, but the most useful and therefore common is 
Pulse Code Modulation or PCM. The input is a continuous-time, 
continuous-voltage video waveform, and this is converted into a 
discrete-time, discrete-voltage format by a combination of sampling 
and quantizing. As these two processes are orthogonal they are 
totally independent and can be performed in either order. Ideally, 
both orders will give the same results; in practice each has 
different advantages and suffers from different deficiencies. The 
second approach is more common in video equipment. 

The independence of sampling and quantizing allows each to be 
discussed quite separately in some detail, prior to combining the 
processes for a full understanding of conversion. 

Whilst sampling an analog video waveform takes place in the time 
domain in an electrical ADC, this is because the analog waveform is 
the result of scanning an image. In reality the image has been 
spatially sampled in-two dimensions (lines and pixels) and temporally 
sampled into fields along a third dimension. Sampling in a single 
dimension will be considered before moving on to more dimensions. 


2. Sampling and aliasing 


Sampling is no more than periodic measurement, and it will be 
shown here that there is no theoretical need for sampling to be 
detectable. However, practical television equipment often falls short 
of the ideal, particularly in the case of temporal sampling. 

Video sampling must be regular, because the process of timebase 
correction prior to conversion back to a conventional analog waveform 
assumes a regular original process. The sampling process originates 
with a pulse train of constant amplitude and period. The video 
waveform amplitude-modulates the pulse train in much the same way as 
the carrier is modulated in an AM radio transmitter. One must be 
careful to avoid over-modulating the pulse train and this is achieved 
by applying a DC offset to the analog waveform so that blanking 
corresponds to a level part-way up the pulses. 

In the same way that AM radio produces sidebands or images above 
and below the carrier, sampling also produces sidebands although the 
carrier is now a pulse train and has an infinite series of harmonics. 
The sidebands repeat above and below each harmonic of the sampling 
rate. 

| The sampled signal can be returned to the continuous-time domain 
simply by passing it into a low-pass filter. This filter has a 
frequency response which prevents the images from passing, and only 
the baseband signal emerges, completely unchanged. If considered in 
the frequency domain, this filter can be called an anti-image filter; 
if considered in the time domain it can be called a reconstruction 
filter. It can also be considered as a spatial filter if a sampled 
still image is being returned to a continuous image. Such a filter 
will be two-dimensional. 


If an input is supplied having an excessive bandwidth for the 
sampling rate in use, the sidebands will overlap and the result is 
aliasing, where certain output frequencies are not the same as their 
input frequencies but instead become difference frequencies. Aliasing 
does not occur when the input frequency is equal to or less than half 
the sampling rate, and this derives the most fundamental rule of 
sampling, which is that the sampling rate must be at least twice the 
highest input frequency. Sampling theory is usually attributed to 
Shannon who applied it to information theory at around the same time Fax 
as Kotelnikov in Russia. Despite that it is often referred to as j 
Nyquist's theorem. 

One often has no control over the spectrum of input signals and 
in practice it is necessary also to have a low-pass filter at the 
input to prevent aliasing. This anti-aliasing filter prevents 
frequencies of more than half the sampling rate from reaching the 
sampling stage. The requirement for an anti-aliasing filter extends 
to two-dimensional sampling devices such as CCD sensors. 

Whilst electrical or optical anti-aliasing filters are quite 
feasible, there is no corresponding device which can precede the 
image sampling at frame or field rate in film or TV cameras and as a 
result aliasing is commonly seen on television and in the cinema, 
owing to the relatively low frame rates used. With a frame rate of 
24Hz, a film camera will alias on any object changing at more than 
12Hz. Such objects include the spokes of stagecoach wheels; when the 
spoke-passing frequency reaches 24Hz the wheels appear to stop. 

The impulse response of a phase linear ideal low-pass filter is 
a sinx/x waveform in the time domain. Such a waveform passes through 
zero volts periodically. If the cut-off frequency of the filter is 
one-half of the sampling rate, the impulse passes through zero at the 
sites of all other samples. At the output of such a filter, the 
voltage at the centre of a sample is due to that sample alone, since 
the value of all other samples is zero at that instant. In other 
words the continuous time output waveform must join up the tops of 
the input samples. In between the sample instants, the output of the 
filter is the sum of the contributions from many impulses, and the 
waveform smoothly joins the tops of the samples. If the time domain 
is being considered, the anti-image filter of the frequency domain 
can equally well be called the reconstruction filter. It is a 
consequence of the band-limiting of the original anti-aliasing filter 
that the filtered analog waveform could only travel between the 
sample points in one way. As the reconstruction filter has the same 
frequency response, the reconstructed output waveform must be 
identical to the original band-limited waveform prior to sampling. 

The ideal filter with a vertical "brick-wall" cut-off slope is 
difficult to implement. As the slope tends to vertical, the delay 
caused by the filter goes to infinity. In practice, a filter with a 
finite slope has to be accepted. The cut-off slope begins at the edge 
of the required band, and consequently the sampling rate has to be 
raised a little to drive aliasing products to an acceptably low 
level. 


Sets p 


3. Aperture effect 


The reconstruction process only operates exactly as described if 


the impulses are of negligible duration. In many processes this is 
not the case, and many real devices keep the analog signal constant 
for a substantial part of or even the whole period. The result is a 
waveform which is more like a staircase than a pulse train. The case 
where the pulses have been extended in width to become equal to the 
sample period is known as a zero-order hold system and has a 100% 
aperture ratio. 

Pulses of 100% aperture ratio have a sinx/x spectrum. The 
\frequency response falls to a null at the sampling rate, and as a 
result is about 4dB down at the edge of the baseband. If the pulse 
width is stable, the reduction of high frequencies is constant and 
predictable, and an appropriate equalisation circuit can render the 
overall response flat once more. An alternative is to use resampling 
which passes the zero-order hold waveform through a further 
synchronous sampling stage consisting of an analog switch which 
closes briefly in the centre of each sample period. The output of the 
switch will be pulses which are narrower than the original. If, for 
example, the aperture ratio is reduced to 50% of the sample period, 
the first frequency response null is now at twice the sampling rate, 
and the loss at the edge of the band is reduced. The frequency 
response becomes flatter as the aperture ratio falls. 


4. Choice of sampling rate --- component 


If the reason for digitising a video signal is simply to convey 
it from one place to another, then the choice of sampling frequency 
can be determined only by sampling theory and available filters. ТЕ; 
Һомеуёг;: ‘processing of the video in the digital domain is 
contemplated, the choice becomes smaller. In order to produce a two 
dimensional array of samples which form rows and vertical columns, 
the sampling rate has to be an integer multiple of the line rate. 

This allows for the vertical picture processing necessary in special 
)effects; data reduction, error concealment in recorders and standards 
/ conversion. Whilst the bandwidth needed by 525/59.94 video is less 
than that of 625/50, and a lower sampling rate might be used, 
practicality dictated that if a standard sampling rate for video 
components could be arrived at, then the design of standards 
convertors would be simplified, and digital recorders would operate 
at a similar data rate even though the frame rates would differ in 
different standards. This was the goal of CCIR Recommendation 601, 
which combined the 625/50 input of EBU Docs. Tech. 3246 and 3247 and 
the 525/59.94 input of SMPTE RP 125. 

The result is not one sampling rate, but a family of rates based 
upon the magic frequency of 13.5MHz. 

Using this frequency as a sampling rate produces 858 samples in 
the line period of 525/59.94 and 864 samples in the line period of 
625/50. For lower bandwidths, the rate can be divided by three 
quarters, one half or one quarter to give sampling rates of 10.125, 
6.75 and 3.375MHz respectively. If the lowest frequency is considered 
to be 1, then the highest is 4. For maximum quality RGB working, then 
three parallel, identical sample streams would be required, which 
would be denoted by 4:4:4. Colour difference signals intended for 
post production, where a wider colour difference bandwidth is needed, 
require 4:2:2 sampling for luminance, R-Y and B-Y respectively. 4:2:2 


has the advantage that an integer number of colour difference samples 
also exist in both line standards. In 4:2:2 sampling luminance 
samples appear at half the spacing of colour difference samples, and 
half of the luminance samples are in the same physical position as a 
pair of colour difference samples, these being called co-sited 
samples. The D-1, D-5 and Digital Betacam recording formats work with 
4:2:2 sampling. 

Where the signal is likely to be broadcast as PAL or NTSC, a нь 
standard of 4:1:1 is acceptable, since this still delivers a colour / 
difference bandwidth in excess of 1MHz. Where data rate is ata 
premium, 3:1:1 can be used, and can still offer just about enough 
bandwidth for 525 lines. This would not be enough for 625 line 
working, but would be acceptable for ENG applications. The problem 
with the factors three and one is that they do not offer a columnar 
sampling structure, and so are not appropriate for processing 
systems. 

The conventional TV screen has an aspect ratio of 4:3, whereas 
in the future an aspect ratio of 16:9 may be adopted. Expressing 4:3 
as 12:9 makes it clear that the 16:9 picture is 16/12 or 4/3 times as 
wide. There are two ways of handling 16:9 pictures in the digital 
domain. One is to retain the standard sampling rate of 13.5Mhz, which 
results in the horizontal resolution falling to 3/4 of its previous 
value, the other is to increase the sampling rate in proportion to 
the screen width. This results in а luminance sampling rate of 13.5 x 
4/3 MHz or 18.0MHz. 


5. Choice of sampling rate --- composite 


When compósite video is to be digitised, the input will be a 
single waveform having spectrally interleaved luminance and chroma. 
Any sampling rate which allows sufficient bandwidth would convey Ыт 
composite video from опе point to another, indeed 13.5MHz has been ) 
successfully used to sample PAL and NTSC. However, if simple 
‘processing in the digital domain is contemplated, there will be less 
choice. | 

In many cases it will be necessary to decode the composite 
signal which will require some kind of digital filter. Whilst it is 
possible to construct filters with any desired response, it is a fact 
that a digital filter whose response is simply related to the 
sampling rate will be much less complex to implement. This is the 
reasoning which has led to the near universal use of four times 
subcarrier sampling rate. If PAL or NTSC are sampled at 4xFsc, there 
is a considerable space between the edge of the baseband and the 
lower sideband. This allows the anti-aliasing and reconstruction 
filters to have a more gradual cut-off, so that ripple in the 
passband can be reduced. This is particularly important for C-format 
timebase correctors and for composite digital recorders, since both 
are digital devices intended for use in an analog environment, and 
signals may have been converted to and from the digital domain many 
times in the course of production. 


6. Sampling clock jitter 


The instants at which samples are taken in an ADC and the 
instants at which DACs make conversions must be evenly spaced, 
otherwise unwanted signals can be added to the video. The effect of 

| sampling clock jitter оп a sloping waveform is that samples are taken 

a the wrong times. When these samples have passed through a system, 
the timebase correction stage prior to the DAC will remove the 
jitter, and the result is amplitude errors. The magnitude of the 
error is proportional to the slope of the waveform and so the amount 
of jitter which can be tolerated falls at 6dB per octave. As the 
resolution of the system is increased by the use of longer sample 
wordlength, tolerance to jitter is further reduced. The nature of the 
unwanted signal depends on the spectrum of the jitter. If the jitter 
is random, the effect is noise-like and relatively benign unless the 
amplitude is excessive. Note that even small amounts of jitter can 
degrade a 10bit convertor to the performance of a good 8bit unit. 
There is thus no point in upgrading to higher resolution convertors 
if the clock stability of the system is insufficient to allow their 
performance to be realised. 

The allowable jitter is measured in picoseconds, and clearly 
steps must be taken to eliminate it by design. Convertor clocks must 
be generated from clean power supplies which are well decoupled from 
the power used by the logic because a convertor clock must have a 
signal to noise ratio of the same order as that of the signal. 
Otherwise noise on the clock causes jitter which in turn causes noise 
in the video. The same effect will be found in digital audio signals, 
which are perhaps more critical. 


| 37. Quantizing | 


Quantizing is the process of expressing some infinitely variable 
quantity by discrete or stepped values. Quantizing turns up ina 
remarkable number of everyday guises. An inclined ramp enables 
infinitely variable height to be achieved, whereas a step-ladder 
allows only discrete heights to be had. A step-ladder quantizes 
height. When accountants round off sums of money to the nearest pound 
or dollar they are quantizing. Time passes continuously, but the 
display on a digital clock changes suddenly every minute because the 
clock is quantizing time. 

In video and audio the values to be quantized are infinitely 
variable voltages from an analog source. Strict quantizing is a 
process which operates in the voltage domain only. For the purpose of 
studying the quantizing of a single sample, time is assumed to stand 
still. This is achieved in practice either by the use of a track-hold 
circuit or the adoption of a quantizer technology such as a flash 
convertor which operates before the sampling stage. 

The process of quantizing divides the voltage range up into 
quantizing intervals Q, also referred to as steps S. In applications 
such as telephony these may advantageously be of differing size, but 
for digital video the quantizing intervals are made as identical as 
possible. If this is done, the binary numbers which result are truly 

~ proportional to the original analog voltage, and the digital 
/equivalents of mixing and gain changing can be performed by adding 


and multiplying sample values. If the quantizing intervals are 
unequal this cannot be done. When all quantizing intervals are the 
same, the term uniform quantizing is used. 

Whatever the exact voltage of the input signal, the quantizer 
will locate the quantizing interval in which it lies. In what may be 
considered a separate step, the quantizing interval is then allocated 
a code value which is typically some form of binary number. The 
information sent is the number of the quantizing interval in which 
the input voltage lay. Whereabouts that voltage lay within the 
interval is not conveyed, and this mechanism puts a limit on the 
accuracy of the quantizer. When the number of the quantizing interval 
is converted back to the analog domain, it will result ina voltage 
at the centre of the quantizing interval as this minimises the 
magnitude of the error between input and output. The number range is 
limited by the wordlength of the binary numbers used. In an eight-bit 
system, 256 different quantizing intervals exist, although in digital 
video the ones at the extreme ends of the range are reserved for 
synchronizing. 

It is possible to draw a transfer function for such an ideal 
quantizer followed by an ideal DAC. This is somewhat like a 
staircase, and blanking level is half way up a quantizing interval, 
or on the centre of a tread. This is the so-called mid-tread 
quantizer which is universally used in video and audio. 

Quantizing causes a voltage error in the sample which is given 
by the difference between the actual staircase transfer function and 
the ideal straight line. This is a sawtooth like function which is 
periodic in Q. The amplitude cannot exceed +/-1/2Q peak-to-peak 
unless the input is so large that clipping occurs. 

The quantizing error waveform can be thought of as an unwanted 
signal which the quantizing process adds to the perfect original. If 
a very small input signal remains within one quantizing interval, the | 
quantizing error is the signal.. 

As the transfer function is non-linear, ideal quantizing can 
cause distortion. As a result practical digital video equipment 
deliberately uses non-ideal quantizers to achieve linearity. 

As the magnitude of the quantizing error is limited, its effect 
can be minimised by making the signal larger. This will require more 
quantizing intervals and more bits to express them. The number of 
quantizing intervals multiplied by their size gives the quantizing 
range of the convertor. A signal outside the range will be clipped. 
Provided that clipping is avoided, the larger the signal the less 
will be the effect of the quantizing error. 

Where the input signal exercises the whole quantizing range and 
has a complex waveform (such as from a contrasty, detailed scene), 
successive samples will have widely varying numerical values and the 
quantizing error on а given sample will be independent of that on 
others. In this case the size of the quantizing error will be 
distributed with equal probability between the limits. In this case 
the unwanted signal added by quantizing is an additive broadband 
noise uncorrelated with the signal, and it is appropriate in this 
case to call it quantizing noise. This is not quite the same as 
thermal noise which has a Gaussian (bell-shaped) probability. The 
difference is of no consequence as in the large signal case the noise 
is masked by the signal. Under these conditions, a meaningful 
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signal-to-noise ratio can be calculated by taking the ratio between 
the largest signal amplitude which can be accommodated without 
clipping and the error amplitude. By way of example, an 8bit system 


G s offer very nearly 50 dB SNR. 
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Whilst the above result is true for a large complex input 


“waveform, treatments which then assume that quantizing error is 


always noise give results which are at variance with reality. The 
expression above is only valid if the probability density of the 
quantizing error is uniform. Unfortunately at low depthsš of 
modulations, and particularly with flat fields or simple pictures 
this is not the case. 

At low modulation depth, quantizing error ceases to be random, 
and becomes a function of the input waveform and the quantizing 
structure. Once an unwanted signal becomes a deterministic function 
of the wanted signal, it has to be classed as a distortion rather 
than a noise. Distortion can also be predicted from the 
non-linearity, or staircase nature, of the transfer function. With a 
larqe signal, there are so many steps involved that we must stand 
well back, and a staircase with 256 steps appears to be a slope. With 
a small signal there are few steps and they can no longer be ignored. 

The effect can be visualised readily by considering a television 
camera viewing a uniformly painted wall. The geometry of the lighting 
and the coverage of the lens means that the brightness is not 
absolutely uniform, .but falls slightly at the ends of the TV lines. 
After quantizing, the gently sloping waveform is replaced by one 
which stays at a constant quantizing level for many sampling periods 
and then suddenly jumps to the next quantizing level. The picture 
then consists of areas of constant brightness with steps between, 


resembling nothing more than a contour map, hence the use of the term 


contouring to describe the effect. 

Needless to say. the occurrence of contouring precludes the use 
of an ideal quantizer for high quality work. There is little point in 
studying the adverse. effects further as they should be and can be 
eliminated completely in practical equipment by the use of dither. 
The importance of correctly dithering a quantizer cannot be 
emphasized enough, since failure to dither irrevocably distorts the 
converted signal: there can be no process which will subsequently 
remove that distortion. 


8. Introduction to dither 


At high signal levels, quantizing error is effectively noise. Аг 
the depth of modulation falls, the quantizing error of an ideal 
quantizer becomes more strongly correlated with the signal and the 
result is distortion, visible as contouring. If the quantizing error 
can be decorrelated from the input in some way, the system can remain 
linear but noisy. Dither performs the job of decorrelation by making 
the action of the quantizer unpredictable and gives the system a 
noise floor like an analog system. 

All practical digital video systems use nonsubtractive dither 


~ where the dither signal is added prior to quantization and no attempt 
) 18 made to remove it at the DAC. The introduction of dither prior to 
— a conventional quantizer inevitably causes a slight reduction in the 
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signal to noise ratio attainable, but this reduction is a small price 
to pay for the elimination of non-linearities. 
The ideal (noiseless) quantizer has fixed quantizing intervals 
and must always produce the same quantizing error from the same 
signal. An ideal quantizer can be dithered by linearly adding a | 
controlled level of noise either to the input signal or to the А 
reference voltage which is used to derive the quantizing intervals. / 
There are several ways of considering how dither works, all of which 
are equally valid. 
The addition of dither means that successive samples effectively 
find the quantizing intervals in different places on the voltage 
scale. The quantizing error becomes a function of the dither, rather 
than a predictable function of the input signal. The quantizing error 
is not eliminated, but the subjectively unacceptable distortion is 
converted into a broadband noise which is more benign to the viewer. 
Consider the situation where a low level input signal is 
changing slowly within a quantizing interval. Without dither, the 
same numerical code is output for a number of sample periods, and the 
variations within the interval are lost. Dither has the effect of 
forcing the quantizer to switch between two or more states. The 
higher the voltage of the input signal within a given interval, the 
more probable it becomes that the output code will take on the next 
higher value. The lower the input voltage within the interval, the 
more probable it is that the output code will take the next lower 
value. The dither has resulted in a form of duty cycle modulation, 
and the resolution of the system has been extended indefinitely 
instead of being limited by the size of the steps. 
Dither can also be understood by considering what it does to the 
transfer function of the quantiser. This is normally a perfect 
staircase, but in the presence of dither it is smeared horizontally “ТА 
until with а certain amplitude the average transfer function becomes 
straight. 
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9. Basic digital-to-analog conversion 


This direction of conversion will be discussed first, since ADCs 
often use embedded DACs in feedback loops. 

The purpose of a digital to analog convertor is to take 
numerical values and reproduce the continuous waveform that they 
represent. The jitter in the clock needs to be removed with a VCO or 
VCXO. Sample values are buffered in a latch and fed to the convertor 
element which operates on each cycle of the clean clock. The output 
is then a voltage proportional to the number for at least a part of 
the sample period. A resampling stage may be found next, in order to 
remove switching transients, reduce the aperture ratio or allow the 
use of a convertor which takes a substantial part of the sample 
period to operate. The resampled waveform is then presented to a 
reconstruction filter which rejects frequencies above the video band. 
This section is primarily concerned with the implementation of the 
convertor element. The most common way of achieving this conversion 
is to control binary-weighted currents and sum them in a virtual 
earth. This is often done with the classical R-2R DAC structure. This 
is relatively simple to construct, but the resistors have to be 
extremely accurate. 
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10. Basic analog-to-digital conversion 


The qeneral principle of a quantizer is that different quantized 
voltages are compared with the unknown analog input until the closest 


)quantized voltage is found. The code corresponding to this becomes 


the output. The comparisons can be made in turn with the minimal 
amount of hardware, or simultaneously with more hardware. 

The flash converter is probably the simplest technique available 
for PCM video conversion. The threshold voltage of every quantizing 
interval is provided by a resistor chain which is fed by a reference 
voltage. This reference voltage can be varied to determine the 
sensitivity of the input. There is one voltage comparator connected 
to every reference voltage, and the other input of all of the 
comparators is connected to the analog input. A comparator can be 


considered to be a one-bit ADC. The input voltage determines how many 


of the comparators will have a true output. As one comparator is 
necessary for each quantizing interval, then, for example, in an 
8@bit system there will be 255 binary comparator outputs, and it is 
necessary to use a priority encoder to convert these to a binary 
code. Note that the quantizing stage is asynchronous; comparators 
change state as and when the variations in the input waveform result 
in a reference voltage being crossed. Sampling takes place when the 
comparator outputs are clocked into a subsequent latch. This is an 
example of quantizing before sampling as was mentioned earlier. 
Although the device is simple in principle, it contains а lot of 
circuitry and can only be practicably implemented on a chip. The 
analog signal has to drive a lot of inputs which results in a 
significant parallel. capacitance, and a low-impedance driver is 
essential to avoid restricting the slewing rate of the input. The 
extreme speed of a flash converter is a distinct advantage in 
oversampling. Because computation of all bits is performed 
simultaneously, no track/hold circuit is required, and droop is 
eliminated. 
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Understanding Digital Television 


Digital Tape Recording 


Overview 


This set of notes and lecture will introduce the concepts of digitally recording the 
television signal. Differentiation between the recording methods and formats, 
together with some of the concepts that are not encountered with conventional 
analogue recording will also be discussed. 


Why Digital 

The advantages of digital recording are overwhelming, as indeed are the overall 
advantages of handling the television signal in the digital form within the studio. 
Eventually the entire path from image acquisition through to delivery will be a series 
of digital channels. 


The initial steps towards a fully digital studio highlighted the need for digital 
recording and suggested that this would offer the widest group of advantages for a 
task that had in many ways probably reached the limits of perfection with the 
analogue technology then employed. It was also apparent that of all the tasks to 
develop digital television equipment, the digital television recorder provided the 
greatest challenge., 


This was recognised in the early part of the 1980’s when the worlds users and 
manufactures set out to established a format that could record the television signal . 
within the, by then, defined (CCIR-601) digital component distribution hierarchy. 
The format developed is now well known as D-1, and currently forms the backbone 
of component digital post production world wide. 


The specific advantages that were sought included the possibility of multi-generation 
recording with no inter-generation loss or degradation. This in itself allowed the full 
exploitation of the other rapidly developing techniques that could be best exploited by 
“assembling” the final picture as a series of superimposed layers. | 


The advantages that digital recording brought to the audio tracks is of equal 
importance, as often the requirements of high quality audio recording were in conflict 
with the tasks of video recording. 


The fact that the entire signal path was approaching digital solutions also made the 
requirement for a digital recorder imperative. 


Composite or Component 

In view of the earlier comments that bodies such as the EBU and the SMPTE, who 
represented broadcasters had defined a component recording strategy and 
manufacturers, initially Sony, had produced equipment to this specification it is 
surprising that alternative composite formats should have been introduced. Indeed 


why was there any requirement for the consideration of additional digital recording 
formats. 


In retrospect it is clear that the over-riding factors were associated with feature and 
economic performance of the overall component environment, rather than technical 
performance, combined with the enormous investment that the majority of 
broadcasters had in both composite and analogue studio and distribution equipment. 


D-1 Characteristics 


е Component recording intended to integrate into a fully componert digital 
environment. 

e Limited recording times 

e Limited trick effects 

e Expensive equipment 


Brief summary of critical D-1 Characteristics 


Digital Only 11 min (Small Cass.)* 


CCIR-601 Video 34 min (Med Cass.)* 
AES Audio 74 min (Large Cass.)* 


* Small increase possible with use of a thinner tape 


Broadcaster needs 
e A large pre-existing composite analogue environment 
e “Digital replacement” for Type-C recorders 
e Solutions provided by D-2 and D-3 format recorders. Both of these are 
intended to record either the PAL or NTSC signal directly converted to a 
digital representation. Both formats are intended to offer easy integration 
into an analogue environment. 


Ideal minimum requirements of Broadcaster 


30 min (Small Cass.) 


90 min (Med Cass.) 
180 min (Large Cass. 


Post Production needs 


e More rapid acceptance of D-1 based digital component systems for the 
specific advantages that were brought to post production applications. 

e Restrictions of D-1 were not of such concern, as trick effects could be 
provided from disk, often as an off-line task. 

® Most users are now employing D-1, Digital Betacam, D-5 or DCT. АШ of 
which are component recording systems. 


Needs of a Digital Recorder 


There are several definable elements that comprise a digital video recorder, and 
equally key characteristics that differentiate a digital video recorder from a pure data 
recorder. 


These elements are dealt with in the following sections, however it is essential to 
recall that the design criteria of contemporary digital recorders has been to emulate 
the operation and features of pre-existing analogue recorders. Therefore many design 
solutions have been selected with the considerations of the need to provide features 
such as variable speed play back, editing of video and audio independently. 
Additional consideration of the television signal with the need to record continuously 
in real time. leads to designs that is very different to that that would have been 
adopted in the case of a pure data recorder. 


Encoding to digital 


In an entirely digital installation it would be expected that the recorder would receive 
and deliver digital data appropriate to the system in use, composite or component. 


However, in order to facilitate the integration of digital VTR's into otherwise 
analogue environments it is normal for systems to offer either external or internal 
A/D and D/A converters. 


Composite 

Employed in the D-2 and D-3 recording formats. The analogue studio signal (PAL or 
NTSC) is subjected in it’s composite form to direct conversion to a digital 
representation. Initially everything is converted to digits, video, sync, sub-carrier and 
burst. Traditionally 8 bits of data are employed giving some 256 quantizing levels. 
To ensure that the greatest resolution is applied to the video signal the actual 
quantizing range need only extend over the range from the lowest to the highest level 
that sub-carrier reaches. In practice it is necessary to allow some margin for higher 
than standard level signals in order that neither clipping occurs nor is excessive 
demand placed upon control of input levels. 


Luminance 
Amplitude 


Overall Amplitude 
including Chroma 


Overall Amplitude 
including chroma 
with overhead 
provision (256 units) 


Component 


Similar to the basic components of the composite television signal. The information 
originally contained in three equal bandwidth signals representing Red, Green and 
Blue are reduced in overall bandwidth demand. This is achieved by representing the 
signal as a full bandwidth luminance signal plus two reduced bandwidth signal that 


describe the “colour difference” signals. Effectively sufficient information from the з. 
three original signals occupying 15 MHz is carried in a total analogue bandwidth of j 
10 MHz. 
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3 Luminance 
a 


amplitude 


Total ау:атіс 
range including 
overhead (256 
units) 


Total chroma 
dynamic range 
including 
overhead (256 
units) 
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It is this luminance and two colour difference signals that are individually digitised 
for the component television scheme and it is these signals that are recorded. 


Sampling frequency 


Composite 


The choice of sampling frequency is related to bandwidth requirements, as defined by 
Nyquist. This would suggest that in the case of a 5 MHz bandwidth composite signal 
the sampling frequency should be in excess of 10 MHz. In fact to simplify filter 
design a frequency somewhat higher than this is usually chosen. 


In the case of a television signal that has a high energy content at relatively high 
frequencies, from the colour sub-carrier, there are further advantages if the sampling 3 
frequency is an exact multiple of the sub-carrier frequency. This helps to minimise 
the effects of inter-modulation between the sampling and the sub-carrier. 
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Early composite digital systems sampled at 3 times the sub-carrier frequency, 
approximately 13.3 MHz for PAL systems with a sub-carrier of 4.43 MHz and 10.74 
MHz for NTSC 3.58 MHz systems. Other than for early experimental systems no 
digital VTR's were produced that employed 3 x fsc sampling, although many digital 
time base correctors were. 


There are significant advantages of a yet higher sampling rate, these include more 
efficient filter design and operation, together with further reduced unwanted 
interference. If the sampling frequency is an even multiple of the sub-carrier 
frequency then there are additional benefits in that complex mathematical processing 
of the signal is greatly simplified or indeed made possible. This can include the 
separation, totally within the digital domain, of the chrominance information from the 
composite signal. This is an important advantage for the processing that is often 
required, for example in the reconstruction of the true PAL sequence (in a composite 
recorder) when providing variable speed play back and original off tape fields are 
either repeated or omitted. 


For these reasons, as soon as the semiconductor technology allowed economic signal 
processing at the higher data rates required by 4 x fsc sampling, all contemporary 
composite digital systems have been based upon 17.72 MHz sampling for PAL and 
14.32 MHz for NTSC applications. 
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Analogue signal - time quantized by sampling clock 


Earlier it was indicated that 8 bits per sample have traditionally been employed in 
composite systems. This directly leads to a raw data rate of nearly 18 MBytes per 
second for PAL, some 142 MBits per second! Data rates that were unheard of in the 
traditional (computer) data recording worlds. 


Note also that the PAL and NTSC systems each require very different solutions, with 
little possibility of switchable standards machines. Not only are the sampling 
frequency and data processing needs different, but as the volume of data differs 
compromises must be made in the actual recording process. For a given cassette size 
and hence area of tape either the recording time will differ or the packing density 
must be increased for PAL with respect to NTSC at a cost to format robustness. 


Time quantized analogue signal - amplitude quantized! for 
binary description 


Component 


Although there is now no requirement to consider relating the sampling to sub-carrier 
frequency, there are advantages if the sampling is related directly to the horizontal 
line period. In view of the similarity of the active line period of the 625/50 and 
525/60 television systems, the sampling frequency has been chosen to give an 
identical number of sampigs (720) per active line i for the full bandwidth 
luminance signal. 


This equates to a sampling frequency of 13.5 MHz for штипапсе. 


Each of the narrower bandwidth colour difference signal are sampled at half this 
frequency, 6.75 MHz. In total, therefore, the component digital signal comprises one 
data stream of 13.5 MBytes per second plus two further data streams of 6.75 MBytes 
each giving a total of 27 MBytes per second. This is equivalent to 216 MBits per 
second if 8 bits per sample are used and some 270 MBits per second if 10 bits per 
sample are employed. Increasingly there is a trend towards 10 bit systems to allow 
for the greater amplitude resolution that is demanded by many current processing 
systems. 
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The closeness of 13.5 MHz to 4 х fsc (NTSC) 14.32 MHz has led to the generic 
description of these three channels of data (13.5, 6.75 and 6.75 MHz) as 4:2:2. 


Data efficiency 


Only record what is required, any unnecessary information need not be recorded, also 
any predictable information can be recreated so need not be recorded. 


Good News 


The horizontal sync and blanking periods contain only repetitive data that complies to 
defined standards -and can therefore be recreated. The same applies to the vertical 
synchronising period. 


Together these offers a potential reduction in the data to be recorded, for both 
composite and component systems of approximately 16% 


Bad News 


Audio must be recorded. This can consume a surprising amount of additional data. 
Four channels of AES audio (48 KHz sampling with 20 bit resolution) with each 
sample written twice to tape is close to 8 MBits per second of additional data. 


Beyond this as we will see, there is additional data that must be recorded associated 
with the need to detect and correct numerical errors that will occur during playback of 
the data. This together with essential data addressing information can add between 
sixteen and twenty percent additional recorded data. 


How much data and how is it put on tape. 


Depending upon the format (composite or component) it is apparent that between 140 
and 270 MBits of data per second must be written to tape, suggesting a recorder 
capable of recovering a “carrier” between 70 and 135 MHz (in conventional analogue 
terms), with sufficient modulated bandwidth to carry the recorded information. 


Some physical limitations 

The majority of the apparently arbitrary parameters employed by each format are in 
fact determined by physical limitations. These include practical and economic 
accuracy of servo systems, desired record channel signal to noise ratio and tape and 
head characteristics and even influenced by the set of features required. 


In reality any format is a carefully balanced compromise between performance, 
features and robustness. In the case of a digital format it becomes possible to a certain 
degree to reduce intrinsic robusmess by accepting that additional error correction 
overhead may be required the ensure the desired level of overall performance. 


Often overlooked in the case of a digital recorder is the importance of the signal to 
noise ratio of the record channel, often known as the “RF” path or data onto and off 
of tape. 


Whereas in an analogue recorder the noise added to the RF signal directly affects the 
signal to noise ratio of the reproduced video, in a digital recorder the RF noise causes 
playback data errors. It then becomes part of the error correction strategy to 
accommodate the detection and correction of all such errors, whilst leaving sufficient 
margin available to correct the majority of “burst” errors associated with tape 
imperfections such as drop-outs and other physical damage. 


In theory it is possible to correct for very high error rates, however it is at the expense 
of having to add extensive additional data for the purposes of protection. This in itself 
adding to the total volume of data to be recorded and possibly further aggravating the 
problem. In practice there is an optimum amount of additional code that can 
effectively be applied. From this it is then practical to calculate the signal to noise 
ratio required of the recording channel. This then determines the track-width and 
recorded wavelengths that are desirable. 


WAVELENGTH TRACKWIDTH 


Effect on error rate of varying recorded wavelength and track width 


Several factor interrelate, signal to noise ratio is directly proportional to track width, 
and magnetic efficiency of tape and reproduce heads, and deteriorates rapidly as 
recorded wavelength is reduced. (Partly as a result of reduced magnetic efficiency 
and partly as a result of the increasing significance of head to tape separation loss) 
Actual combinations of parameters vary from format to format, in practice however, 
track widths employed by current recorders range from approximately 15pm їо 45pm. 
Similarly recorded wavelengths are found to be between 0.5pm and 1.0um. For 
practical head to tape writing speeds this limits the maximum recorded frequency to 
considerable less than that indicated earlier as required by the digital television signal. 
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This conflict is easily overcome by splitting the digital data into a series of parallel 
paths each simultaneously recording part of the total data volume. 


Typically for the less data capacity demanding requirements of composite recorders 
two data channels are employed, whereas conventional component recorders require 
four simultaneous channels to ensure that the recorded wavelength is kept within 
practical limits. Some of the component recording formats that employ image data 
reduction, are able to offer systems requiring only two channels. 


In summary, there is a considerable amount of data to be recorded, possibly 
exceeding 100 GBytes per hour, actual recording parameters depend not only on 
physical characteristics of tape and heads, but also on objectives such as desired 
cassette size, record duration and possible compatibility that is desired with other 
formats. Also there is a trade off between signal system and mechanical complexity 
and signal processing. 


Input processing 
A to D conversion 
Data multiplexer 


Composite input Data at approximately 142 
Analogue, or digital MBits/sec Data stream to each head channel 


data approximately 70 MBits/sec 


Audio input channels 
Analogue or digital 


Input processing 
A to D conversion Four Channels of data 
Formatting Data multiplexer recording 
Audio 
ECC 


Component input, Luminance & Data at approximately 
colour difference. 220 MBits/sec 


Analogue or digital 


Data stream to each head 
Audio Input channels channel approximately 55 
Analogue or digital MBits/sec 


4:3 and 16:9 


Essentially a recorder is not sensitive to the aspect ratio of the original image, it is 
only the data associated with that image that is stored. This is unlike the situation 
associated with cameras, DVE’s and mixer patterns, where the picture geometry 
differences must be considered. 


The only exception is if the overall system, in order to preserve equal resolution in 
the wider aspect ratio, increases the sampling frequency beyond that currently defined 
by CCIR-601. In this case the volume of data to be recorded likewise increases. 


(— $ 
720 samples (13.5 MHz) 
960 samples (18.0 MHz) 


Techniques and buzz words 


Analogue recording of the television signal required that little or no change was made 
to the incoming video and audio signals prior to recording. The video, frequency 
modulated the recording carrier, and after suitable amplification this was used to 
drive the video heads and magnetise the tape. Essentially the string of picture 
elements were sequentially recorded close to real time. 


In the digital recorder the process is neither so simple nor so direct. Part of this is 
necessity, part of this is to exploit advantages that digital processing affords. 


Error correction 

With analogue recorders noise and tape blemishes impaired the reproduced signal 
quality. This degradation was cumulative from generation to generation and in fact 
became the limiting factor in the overall performance of analogue recorders. 


In the case of digital recording this same noise and blemishes cause data to either be 
missing or in error. By the very nature of a binary number that is represented by 
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“Zero’s” and “one’s” if the position (or presence) of an error within the data can be 
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detected then correction of that error is simply a case of changing the current value of 
that digit. | 


Simple “сһесК sums” can readily determine words (or bytes) of data that are in error, 
however there is little possibility to determine which bit is erroneous. Two 
dimensional arrays can locate single bit errors from a group of words and thus 
correction becomes possible. Only a single bit from the array can be corrected and so 
the overall efficiency is poor. 


As semiconductor technology has advanced, allowing very fast complex mathematics 
to be performed, more sophisticated algorithms have evolved that provide more 
extensive data protection. Of these, product codes, offer a simple and effective means 
of determining the location of bit errors within complex word strings. Two 
dimensional arrays provide powerful and efficient methods of providing error 
detection whilst offering an excellent correction capability, by providing finite 
protection to part of the error protection code itself. 
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Two Dimensional Error Correction Code Matrix 


The most familiar of these is the Reed Solomon Product code. This is used, in 
differing implementations, to provide the error protection employed in CD recording, 
digital video recorders and some data recording systems. 


Like all error detection and correction schemes the actual “power” or magnitude of 
error that can be corrected, is dependent upon how much additional data can be 
provided to carry detection and correction code. Formats vary in the extent of 
correction that they offer. 


Shuffling 


One of the many advantages of digital data, compared with the analogue equivalent, 
is the opportunity to add additional information that does not become part of the 
image. This in turn provides the capability of dividing the data stream into a series of 
blocks and then giving each block of data an unique address. This address then 
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relieves the necessity to carry the data in a pixel by pixel sequential order, and the 
necessity to record truly in sequential real time. 


Any error correction scheme has its limits beyond which correction of detected errors 
is no longer possible. In a true data recorder this performance limit must be beyond 
the point at which the data errors are either insignificant or so improbable as to be 
acceptable. 


A television image contains considerable redundancy, and this has always been 
exploited to allow correction (or more accurately masking or concealment) of tape 
imperfections such as drop outs. The process evolved to provide very effective drop 
out compensation for analogue recorders, with missing information derived by 
interpolation of surrounding valid information. Never the less in the case of large 
drop outs performance was limited by the lack of accurate surrounding information. 


Shuffling, or re-ordering the sequence in which the data is recorded, with 
complementary “de-shuffling” in playback, allows а single large (un-correctable) 
error with poor concealment possibility, to be exchanged for many small errors 
offering the possibility of near invisible concealment. 


Result of “Burst Error” (Tape Drop-out) With and Without Data Shuffling 


Without Shuffling With Shuffiing 


The audio signal contains little or no natural redundancy, and therefore cannot rely 


upon such concealment techniques. Either the redundancy must be provided as a part 


of the recording process, for example by duplicating each audio data in differing 
parts of the tape, or by proving considerably more powerful detection and correction 
for the audio data than for the video. Often a combination of both techniques is 
employed. 


Channel Code 


The complex stream of data that comprises the video signal in digital form, to which 
has been added the audio data, both with individual and overall error detection and 
correction code, plus data block addressing is the signal that in several data channels 
must be recorded on tape. 
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The actual “magnetic” representation of the digital data, is like so many of the 
aspects of the digital VTR, dependent on format, this in turn in many cases being 
influenced by the manufacturer who developed the format. 


Essentially all of the various “channel” codes seek the same objectives. Reliable data 
written and read from tape, with ideally the spectra resulting from data modulation as 
narrow as possible. This latter requirement easing the tasks of equalisation, 
improving the off-track performance and making signal recovery easier at speeds 
other than normal play. Each manufacturer claims some key benefit of his particular 
code. In reality all the modern codes perform well. 


Tracks on Tape 


Earlier formats, such as D-1 borrowed heavily from analogue recording techniques, 
with helical tracks each separated by a guard band. 


Due to the relatively high writing speeds required, even with multiple channel 
recording, the track length required to record a full field of the television signal 
would be excessive. In practice, therefore, all existing digital formats employ 
segmented recording. The restrictions associated with analogue segmented recording 
formats are eliminated by the numerical accuracy of digital recording. 


The audio channels are multiplexed into the total data stream in such a way as to 


allow individual access to these “blocks” of data. This provides independent editing 
of video and each of the audio channels (tracks). 


In any helical scan format some track straightness errors can be anticipated, these can 


be expected to be at their greatest at the ends of the tracks. It was indicated earlier 
that the audio signal requires more care in recording as there is no natural redundancy 
and un-corrected audio errors are far more objectionable than video errors. For this 


reason it became customary to record the audio at the centre of the tape width where 
the track is at its straightest and any tape damage is expected to be minimum. 


One of the limitations of the D-1 format is that the start of the recording of the 
television field is also positioned at the centre of the tape. This makes the 
implementation of trick effects, particularly variable speed playback, more difficult 
to implement. 


The previously required guard band had to be of sufficient width to ensure that the 
fringing field from adjacent tracks did not interfere. 


Traditional tracks, separated by a guard band 


More recent formats exploit the fact that with shorter wavelength recording, 
occupying a narrow spectra associated with modern channel codes allows alternative 
methods of separating adjacent tracks. 
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With the narrower spectra afforded by such channel codes separation is now usually 
achieved by azimuth recording, where adjacent tracks have alternated off-sets of the 
head azimuth. 
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Adjacent tracks separated by alternating record gap azimuth 


The elimination of the guard bands directly contributes to an increase in data packing 
density, where the previously employed guard bands could waste as much as 30% of 
the available tape area. In addition azimuth recording allows the use of a play back 
head that is wider than the recorded track. This provides considerable additional 
tolerance to tracking errors. 


TRAVEL DIRECTION OF TAPE 
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Typical Tape Footprint 


Actual variations between formats include track and tape width, and recorded 
wavelengths. Specific electro-mechanical differences relate to track separation 
methods, location of audio blocks and the physical point at which the beginning of a 
field recording starts. 


» 


Guard band 2 Centre of 
approx helical 


Azimuth ; Ends of play on 
helical 
tracks 


Azimuth lj Centre of 
helical 
track 


D-5 format is mechanically similar to that of D-3 and uses similar channel code, D-3 is composite recording, this similarity 
allows the additional data associated with component recording to be achieved in the D-5 recorder by doubling the number of 
record channels, combined with doubling the tape linear speed offers component recording on a format that can by simple speed 
change provide compatibility with the composite D-3 format. 


Error Concealment 


A digital recorder intended for pure data recording applications whilst having greater 
demands placed upon it for data accuracy, can at the same time ойеп exploit the поп- 
real time nature of events to achieve this level of performance. This may for example 
include read after write, with successive attempts to re-write if there are errors 


detected, even with the possibility of moving to another portion of tape, during this 


time the computer or data buffer holds data and waits until the tape is once again 
ready to write. 


Likewise, errors detected during play back are often only a transitory phenomenon, 
where a subsequent re-reading of the tape will produce an error free recovery of data. 


In a video tape recorder the continuity of recording or replay is essential (in most 
applications) and it must be recognised that there are limits to the overhead that may 
be added for error correction. Fortunately, any image that conveys a recognisable 
pattern by definition contains significant redundancy of information. It is this spatial 
(and sometimes temporal) redundancy that is exploited to mask those errors that are 
detected but cannot be corrected. The use of data shuffling ensures that the majority 
of these errors are confined to single or small pixel blocks. The good data 
surrounding such errors can be effectively employed by carefully weighted 
interpolation to create a very close synthesis of the missing data. 
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PLAYBACK PATH OF RECORDER 


RESIDUAL ERRORS 
RAW ERRORS AFTER 


OFF TAPE CORRECTION 


It must be remember though that the replaced data is only an approximation, a 
concealment is not a correction, and that once performed such errors are propagated 
throughout all further generations. With some digital video recording formats, it is 
the frequent occurrence of concealment that ultimately becomes the criteria that 
limits the number of digital generations that can be achieved. (Whereas in theory, for 
a perfect digital recording there should be no limit to the generations or copies that 
can be made). 


PIXEL IN ERROR 


CORRECTED 
ERROR 


CONCEALED 
ERROR 


Even with the benefits of maximum surrounding data afforded by the 
use of shuffling, the capabilities of concealment are limited to an 
approximation of the absent data. 
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Today’s formats 


D-1 through D-5, Digital Betacam, DCT and DVC offer seven different formats, 
(there is no D-4) with little or no interchange compatibility between them. What 
“compatibility” does exist is limited to either some degree of mechanical similarity 
allowing dual format operation, or electrical similarities allowing exchange of data. 


In many ways it may be argued that the lack of interchange compatibility is less 
important with digital formats than it was with the previously employed analogue 
formats. In an analogue recording environment the primary objective was to minimise 
the number of generations. To achieve this multiple machines together with large 
switchers in an edit suite allowed as much as possible to be accomplished in a single 
operation. Similarly, copying, just to accommodate a differing format could not be 
considered acceptable. 


In the digital environment the restrictions of generation to generation loss have 
largely been eliminated. One of the results of this is seen in the simplification of 
equipment requirements in a complex edit. Two recorders and a small but powerful 
(complex effects and keys) switcher being able to achieve on a “layer by layer” basis 
what multiple machines did in analogue systems in a single or limited number of 
“passes”. There is also a greater readiness to accept that acquisition, post-production 
and distribution, do not necessarily have to be on the same format, indeed there may 
be advantages of optimising differing formats for different tasks. There are, never the 
less certain limitations that should be observed, for example optimum results will not 
be achieved if composite and component recording techniques are mixed. 


Compatibility 
Broadly, the contemporary formats may be divided into two broad categories: 


Clearly D-2 and D-3 although offering electrical exchange of data via a common 
serial or parallel interface are not physically compatible with differing tape gauges. 


(Note that all digital formats, with the exception of D-1, employ metal tape 
technology, primarily metal particle but with future formats the trend is towards 
metal evaporated tape) 


D-3 and D-5 share a high degree of mechanical similarity, with similar tracks being 
written to tape for both the composite and component versions. The significant 
difference being that the number of tracks per field is doubled for component 
recording. This requires that the tape speed is higher and that there are more heads 
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(channels) recording for component than for composite. This commonality allows 
versions of the recorders to be produced that offer switchable selection of composite 
or component operation, internal transcoding between the electrical alternatives of 
composite and component recording may also be offered. 


Digital Betacam has likewise been designed to afford levels of compatibility with 
previous formats. In this case the compatibility is with Betacam SP (a component 
analogue recording format). Similarity of track positions ensures that equipment can 
be manufactured that can offer hybrid operation permitting recording and playback of 
component digital video or the playback of Betacam SP recording that may for 
example have come from either portable acquisition equipment or from material 
within the archive. To achieve digital component recording within the same (or very 
similar) tape area that is available for analogue recording mild data compression is 
employed to reduce the video data to be recorded by a factor of approximately 2:1. 


DCT (Digital Component Technology) offers only electrical compatibility with other 
component formats. The objective of DCT is a very conservative recording strategy 
with relatively low packing density resulting from the application of image data 
compression and the availability of a large recording area on the 19mm metal particle 
tape employed. Many of the format parameters share high commonality with those of 
DST (Data Storage Technology), a format that is designed for demanding data 
recording applications. 


Interestingly DST is also finding applications within the television industry where 
massive amounts of data are to be stored, graphics and electronic manipulation of 
film images, where resolution is not limited to that of the conventional television 
systems and the is no requirement for real time recording of the television signal. | 


Specifically DCT by combination of the tape mechanical format and powerful error 
correction sets the objective of providing a post production environment where 
concealment of errors may be considered avoidable with all errors truly corrected, or 
identified to allow later elimination. 


DVC (Digital Video Cassette) is the result of co-operation of many manufacturers to 
define the parameters of digital format suitable for consumer use. Empioying 6 mm 
tape and the application of somewhat higher data compression to a component based 
recording strategy this format will have applications beyond those of purely consumer 
uses. Certainly with differing compression ratios, and hence record duration’s, the 
needs of both acquisition and distribution may be fulfilled by such a format. 


General Trends 
Broadly there are two inter-related aspects of all new formats. 


Higher packing density 

This is made possible by many factors, that include improved tape technologies, such 
as metal particle and metal evaporated tapes that by increasing the output whilst at the 
same time reducing tape noise and surface blemishes have allowed the employment of 
narrower tracks and shorter recorded wavelengths. Further the thin nature of the 
magnetic surface has allowed significant reduction in the overall tape thickness, 
(coating + base film + back coat), all contributing to a greater volumetric efficiency. 
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Image data reduction 


Clearly there are only disadvantages to recording more data than is actually required, 
any savings that can be made can be employed in a combination of many ways. These 
can include increase of recording duration, reduction of tape consumed or perhaps 
improvement in format robustness by either more conservative mechanical 
parameters or the addition of more error correction code data. 


All formats therefore employ some degree of data reduction, whether this be simply 
the elimination of repetitive and replaceable information such as synchronisation 
waveforms. 


It should also be recognised that all the formats discussed, both composite and 
component, only record narrower bandwidth chrominance signals, either that 
associated with the PAL or NTSC composite signals or the reduced bandwidth of the 
colour difference signals employed in component systems. Both of these are very 
effective and accepted data reduction schemes! 


Further data reduction is achieved in some formats by the employment of alternative 
coding strategies to those associated with linear PCM. The most commonly used is 
the Discreet Cosine Transform (DCT) used by both Digital Betacam and DCT 
(Digital Component Technology) formats. Although the subject of more detailed 
explanation elsewhere these alternative codes may be considered as a more efficient 
way of describing the same image when there can be anticipated to be a high degree 
of redundant data present. It should, however, always be remembered that whilst the 
visual image may appear totally unaffected by such data reduction schemes, as indeed 
is the intention, true numerical transparency may not be preserved. The result is that 
care must always be taken when evaluating such schemes to ensure that the results 
represent the objective to be achieved, that is the recording of an image, rather than 


pure data. 


IMAGE DATA GENERATION PIXELS 


Analogue samples of 
depicted area of image 


19 


IMAGE DATA GENERATION 


Í FULL Í FULL FULL FULL | FULL) FULL FULL | FULL, ‘FULL FULL 
| DATA DATA DATA DATA DATA DATA DATA DATA DATA | DATA 


DATA VOLUME 


Linear PCM produces a constant volume of data for each sample, 
irrespective of the actual value or change in value with respect to 
adjacent samples. 


IMAGE DATA GENERATION 


| FULL : DIFF - | DIFF ` DIFF | DIFF | | DIFF : | DIFF | DIFF : DIFF | | | DIFF - 
DATA DATA | | DATA: DATA. шы DATA | | DATA: DATA : DATA | DATA 


As an example Differential PCM (DPCM) recording only the changes 
between samples offers significant reduction in the data to be recorded. 
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For the higher data rates of a digital VTR, DPCM does not prove practical, and other 
alternative methods are employed to detected and eliminate any image redundant data 
within the picture. It may be considered that these alternative coding systems are in 
effect more efficient code structures, if visually they present the same image with less 
data. An example block diagram of the record path of a Digital VTR employing DCT 
(Discreet Cosine Transform) is illustrated below. 


216 Mb/Sec 173 Mb/Sec 
FIXED DATA RATE REMOVAL OF REMOVAL OF | 
PCM VIDEO INPUT H&V SPATIAL | 
ү ВІ.АМКІМО у REDUNDANCY | 


INTERVALS 


(DCT) 


DATA RATE 
QUANTISATION THRESHOLD о BUFFER OCCUPANCY 
106 Mb/Sec 
QUANTISATION VARIABLE BUFFER 
OF DCT LENGTH STORE у 
COEFFICIENTS CODING | i 
VARIABLE DATA RATE FIXED DATA RATE 


Block Diagram, Data Rate Reduction 


The output, data reduced, is then handled as described earlier, with separation of the 
data into two parallel data paths, after the addition of Audio data and error correction 
code. 
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Things that are different to analogue 
Things to look for and expect to be different to the analogue equivalent world. 


Input & Output 
Video may be:- i 


Digital 
Composite or component 


Serial or parallel (Serial on BNC Connector, Parallel on 25 pin “D” Connector) 
8 bit or 10 bit 


Offer imbedded audio (Carried on the Serial Digital Video [ВМС]) 


Analogue 


Optional or standard, some machines are intended for application in a solely digital 
environment. 


Composite or component 
Audio may be:- 


Digital 
— Parallel ((25 pin ^D" Connector) 


dus” 


AES Serial ((Standard XLR Connector, each connector carries two channels of audio) 


Embedded with video 


Analogue 
Usually standard as an addition to the digital interface. 


Timing and alignment 

The necessity of manual timing adjustment in the serial digital studio is largely 
eliminated as is the need for cable equalisation. Both of these functions being 
automatically performed as an integral part of the serial interface. 


The majority of the alignment functions associated with a digital recorder are either 
unnecessary (compared with the analogue equivalent) or are automatically performed. 
These can include record and playback optimisation and equalisation, but can also 
extend to automatic edit optimisation. 


Maintenance 


Digital signal processing lends itself to the implementation of powerful diagnostics, | 
expect therefore comprehensive signal system, servo, transport, control and power 22: 
management diagnostic assistance. 
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This may encompass many differing techniques, and most probably any single 
machine will include a combination of varying diagnostic processes. Most typically 
these will include extensive and essential tests at power on. These will probably 
verify the correct operation of the transport path, electronics and control, memory, 
CPU and software. 


Further continuous “background” diagnostics will monitor the continuing operation 
of power supplies, servo systems and operational functions. Most of these diagnostic 
systems are designed to prevent damage or incorrect or non-standard recordings due 
to system failures, external conditions (missing inputs for example) and operator 
errors. 
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DIGITAL AUDIO 
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Digital Digital Digital Digital 
Audio Inputs Audio Outputs Video Inputs Video Outputs 
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Finally many systems offer on-demand diagnostics of a more comprehensive nature 
to locate specific faults. This location is usually to PWA (Printed Wiring Assembly) 
level or sub-section of a PWA, seldom to specific component. This recognises that 
contemporary PWA design and the use of high pin count ASIC devices often 
precludes the possibility of field repair. Actual diagnostic methods range from 
“probe” systems through in-built signature analysis or defined test conditions to 
comparative or “reciprocal” path fault elimination. This latter technique is sometimes 
referred to as “layered” system technology, as can be seen from the previous diagram. 


Picture and sound at non-play speeds 


These capabilities, established as highly desirable features in various analogue 
formats, are essential if efficient editing and other operational capabilities are tu be 
preserved. Narrow tracks, short wavelengths and the data shuffling combine to make 
the recovery of pictures in shuttle far more difficult than for analogue recording 
systems. Essentially, as “fragments” of the picture are recovered they are placed in 
the correct location of a frame memory, the location being determined from the data 
addressing information. 


The resulting appearance of “picture in shuttle” is that of a constantly changing and 
updating mosaic representing the changing picture. Despite these restrictions, the 
results are good enough for recognisable pictures to be achieved that can assist in 
location of image cue points. Due to the specific track geometry of differing formats, 
it can be expected that better (more recognisable) pictures will be achieved at certain 
shuttle speeds tan at others. Some recorders limit shuttie with pictures to the unique 
speeds that provide the clearest image, others offer a continuously variable shuttle 
range, with restricted image quality at certain speeds. 


The audio channels are recorded as discreet sectors of the helical tracks, this brings 
both advantages and technical challenges to the task of producing audio at non-play 
speeds. 


The challenge is first to reliably recover data that can be decoded, with modern 
formats that employ various automatic head tracking systems this does not represent 
the serious problem, over a limited range of speeds typically from jog to include the 
variable play range, that it did with earlier formats. 


Once the data is recovered the most obvious advantage, compared to analogue 
recording on a longitudinal track is that although the speed of the audio is affected by 
the linear tape speed, and hence the picture speed, the pitch remains constant. At 
speeds greater than normal play “fragments” of audio are omitted, this representing 
no serious limitation to intelligibility for the purposes intended. When the speed is 
lower than normal, duplication of “fragments” can lead to an objectionable “buzz” 
unless the manufacturer take one of several methods to minimise the effect. In 
practice all modern formats offer more than acceptable audio recovery at non-play 
speeds. 


Other techniques are also available that can offer perfect audio recovery at speeds 
near normal play speed. This capability has enormous advantage if programme time 
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expansion or compression is required. Usually the range is restricted to + 15% of 
normal play speed. 


Signal Path Delays 


With the complex processing that is required, even in a relatively simple machine, the 
signal delay through the “E-E” path can be anticipated to be significantly larger than 
that associated with analogue VTR’s. When the additional tasks of image data 
reduction are involved this delay may be of the order of several frames. 
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What might the future bring 


There are many detractors who would predict the end of tape recording as a viable 
television production recording device. The assumption is that tape will be replaced 
by large disk systems. 


There is some truth in this prediction, in that disk brings many advantages, fast 
random access being the most notable. There is also the assumption that packing 
density for optical disk recording can far exceed that of magnetic recording, this is 
not true. 


The probable reality is that some acquisition needs will be beneficially accomplished 
with disk. Examples are news, where the actual record duration required is small, yet 
the ability for endless “standby” recording of a cyclic nature is an immediate 
advantage. 


Longer length recording will probably remain either on film or tape depending upon 
the production requirements. 


Post production, undoubtedly can benefit from the intrinsic advantages of disk. It is 
reasonable to assume that disk will also provide the production tools of variable speed 
play and other "trick" recording capabilities. This is particularly true in the television 
commercials production sector where “programme” duration is small, yet the 
production requirements are complex. 


However disk is neither efficient in volumetric capacity, nor economic capability as a 
long term storage media. Although it has been argued that disk capacity is improving 
every day it must be remembered that the same technological advances can be applied 
to tape. Recall that with today’s component systems one hour of television is over 
100 GBytes of data, and the majority of tuday’s digital cassette formats can store 
sufficient data for typically two hours of programme for around £100. 


The place for tape 

Long form acquisition 

Archive of post production systems 
Long form post production tasks 
Data exchange between sites 
Distribution 


It is not essential nor necessarily desirable that all the above applications share a 
_ common format. Also it is not essential that the recorder of the future is featured with 
all the production capabilities of today’s recorders. They may be simply data stores. 


The place for disks 


Some short form acquisition (News) 

Some short form distribution 

Data compressed programme distribution 

Post production as the main tool of the production suite. 


...епд... 
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DIGITAL RECORDING - DISKS 


Introduction 


Why do we use disks? Whats wrong with tape? What is all the 
hype about tape less broadcasting? 


А few commonly asked questions, firstly lets have a look at what 
disks are good at:- 


Complete digital transparency (Uncompressed ones) allowing an 
almost infinite number of play manipulate record passes to be 
made. This is the cornerstone of digital video compositing. But 
why do need digital transparency? If you consider even the most 
basic of video sequences in advertising etc. how many times in 
everyday life do you see dancing cows chased by lawn mowers? Or 

milk bottles dancing and skipping around the delivery persons 
feet? All of these are products of multi-pass digital video 


compositing. Real time digital compositing relies on compositing 


a couple of elements together, recording the output, taking the 
recorded output and compositing more elements on top. This 
process repeats until the program is complete, imagine the 
quality of the first layer of video after 10 play record passes 
on a non digitally transparent recorder. 


There are uses for compressed disks, mainly in live and broadcast 
play-out environments. 


Total random access to program material, allowing true non linear 
editing, no pre-roll requirements or rewinding. 


No maintenance, disks have typical MTBF (Mean time between 
failures) figures of half a million hours, putting that into more 
meaningful units half your life time! You would indeed be a 
brave man to expect that figure, but still an impressive number 
if looking for several years of maintenance free running. 


Comparison of media bandwidths. 


601 video raw data rates are generally of the order of 170 to 210 
M bits /second after blanking removal. The two media types of 
interest that satisfy todays video requirements are Winchester 
disks and magneto optical disk drives. So what usable data rates 
can we achieve from these two technologies? 


Winchester disks, there is a lot of hype about hard disk transfer 
rates, a state of the art 3.5" 1G byte drive peaks at about 50M 
bits/ second assuming you are on track and reading from the outer 
high capacity zones. This figure can drop as low as 25M bits/ 
second as you read further into the centre of the drive, This 
technique is known as zone bit recording, allowing the drive 
manufacturer to increase drive capacity and maintain aerial 
density (bits per square inch) across the drive surface. 
Sustained rates including seeks, head switches SCSI overhead and 
thermal calibration routines typically drop this figure to about 
70%, thats the good news. 


Magneto optical the story gets even worse! A typical CD ROM i.e. 
Compact Disk yields about 1.5 M bits/ second. Current MO drives 
of 1 G byte yield 10 M bits/ second. Rumours suggest this figure 
could be as high as 20 M bits /second next year. MO drives also 
have a major problem in their seek times are typically in 100’s 
of milli seconds, compared to 1 Or 2mS track to track seek times 
for winchester drives, Obviously long seek times effectively 
reduce the average bit rate per second coming off the drive, 
making non Linear editing more difficult. 


Why are MO drives so inferior at present. Typical aerial 
densities on winchester drives are far higher than MO, yielding 
more bits per second flying under the heads and therefore higher 
data transfer rates. This will get better with time. 


Physical size is an important factor, winchester drive platters 
are double sided, MO is single therefore the same number of bits 
occupies double the surface area. Winchester disks can and do 
use multiple platters to increase data capacity, MO has to 
increase its diameter to increase capacity. The net result of 
this is a much larger, slower and more massive head actuator 
system for MO drives than the equivalent winchester drive. The 
big advantage of MO, removable media, also works against it for 
head fly height, winchester drives operate with small light head 
actuators in a sealed particle free environment, MO operates in 
particle laden free air, thus the head to disk tolerances (head 
fly height) dimension is larger for MO and therefore the aerial 
density, capacity and transfer rates are all inferior. 


Some fairly novel techniques have been used to improve the seek 
time shortcomings of MO, Pioneer for one in their Analogue disk 
machine use two head actuators allowing one head set to be 
playing, while the second head set seeks the new track. This 
helps solve one of the MO shortcomings, it does not however 
address any of the others outlined above. 


It is worth spending a short time looking at the Analogue MO, It 
obviously yields huge durations, approx 30 minutes. As used it 
is an analogue only machine and far from transparent. This kind 
of machine uses an analogue FM (frequency modulation) recording 
technique оп ап eight or twelve inch disk and as such has a 
fairly low frequency response and relatively poor signal to noise 
ratio for this application. Тһе basic technology is however very 


‘similar to the digital M O's above. 


These numbers are obviously way out compared to real video rates, 
8 bit 601 minus blanking consuming about 170 M bits/ second, 10 
bit 601 210 M bits/ second. Assuming the current passion for 
bandwidth at all costs continues and we assume 10 bit 601 as a 
goal, Current hard drives are about 8:1 out on raw bandwidth, MO 
drives about 20:1 short of data rate. Having just proved to you 
that real time digitally transparent disk machines cannot be 
made, how do we do it? 


Practical realisation of a digitally transparent recorder. 


Currently there are three techniques for increasing bandwidth to 
the levels required. 


1/ Parallel transfer, giving access to multiple heads at once in 
a standard Winchester disk. 


2/ RAID, redundant array of inexpensive disks. Ganging enough 
standard drives together to yield the required bandwidth. 


3/ Compressed off line disks. 


4/ Tapeless broadcast disks. An extension of the RAID 
architecture. 


All three techniques have their merits, Parallel transfer has low 
component cost on its side but huge risk factors against it. I 
should probably qualify that a little, the risk is for the 
manufacturer foolish enough to pursue this course of action, not 
the user. Abekas has with all of its disk machines followed the 
parallel transfer method, more of this latter. 


RAID is probably the safest approach but has a major down side 
for the stand alone disk manufacturer in its cost, to get the 
bandwidths required involves the use of 8 to 20 disk drives. 
This leads to huge video capacity, approximately 10 G bytes equal 
to 8 minutes of storage, and component cost similar to the 
selling price of our cheapest parallel transfer offering. 


Figure 1 shows a typical Winchester disk, lets have fairly 


detailed look inside one. The key components are data cecording 
surfaces, read/write heads, head positioning actuator, and data 


multiplexor and pre-amp. Most modern disk drives rotate at 
between 60 and 90Hz and write circular concentric parallel data 
tracks known as cylinders. Data is written as sequences of 


magnetic flux changes using a self clock extracting pattern that, 
simplistically put describes two bits of information as three 
magnetic flux changes on the media. Once recorded, data is read 
back by the same head that recorded it, the heads are kept on 
track by either a separate servo surface and micro positioning 
mechanical feedback loop, or embedded servo information written 
between data sectors. Data is written in blocks known as 
sectors, sectors are to some extent user programmable in length, 
typically 512 bytes. In SCSI drives each data sector is given a 
unique address known as a Logical block address or LBA. Modern 
disks have between 2 and 8 double sided data platters, therefore 
once on track, several heads and data tracks are available across 
the head stack .Other data cylinders are accessed by moving the 
head actuator and swinging the head stack across the media. Most 
high performance drives use voice coil type head actuators, all 
positional feedback is derived by reading the data surfaces, 
using these techniques high performance drives achieve track 
pitches of 4000 tracks per inch. Lower performance drives using 
less thar 1000 tracks per. inch use simple stepper motor head 
actuators. Head signal levels are measured micro-volts and 
normally the head multiplexer and pre-amplifier are contained 
within the sealed disk enclosure known as the HDA. 


Data recovery is performed by programmable pce-emphasis networks, 
filters and data slicers. Due to the constant data density, the 
media data rate across the drive varies 2:1 from inner to outer 
cylinder, therefore the data filters, pre-emphasis circuits and 
pll all have to be programmable on a zone py zone basis. A 
typical modern drive will have 10 different data rate bands 
across the media. The data is decoded from RLL 1,7 encoding 
format and re-clocked by an clock system phase locked to off head 
data, thus the self clock extracting data pattern. 


Parallel transfer disk machines are an extension of the above and 
, it can be seen that increasing the bandwidth is a fairly simple 
matter, remove the head multiplexor, and use multiple heads at 
once thereby doubling, trebling etc. the available data 
bandwidth. A good theory, unfortunately this involves taking the 
lid off the disk drive in a class3 clean room and then installing 
flexible PCBS circuits into the drive, allowing access to all 
heads. Several problems still lie ahead, the massively increased 
capacitance of the flex circuit, way higher than the drive 
manufacturers have to cope with, minimising cross-talk between 
the 15 heads with signal levels measured in micro-volts and 
producing 15 sets of analogue read/write electronics to name but 
a few. The mechanical stability of the head stack poses major 
problems, the drive manufacturers reduce the mass of the head 
stack a much as possible to reduce seek times, this has the 
unfortunate side effect of making it move with temperature and 
acceleration, thereby skewing the head to head data fairly 
dramatically. The head skew problems can be minimised by 
recording index marks on the media and realigning the multiple 


streams with FIFO. 


Media defect management is normally handled by the disk drive, 
using either Fire code data protection, Reed Solomon error 
correction on current drives or by SCSI sparing. This works by 
allocating spare sectors on each drive cylinder and re mapping 
the data in the sectors containing the media defects to the 
spares. Obviously after modification all of these features can 
no longer be used. Тһе current range of parallel transfer disk 
machines use a full video rate Reed Solomon error correction 
scheme to cope with media defects. 


Modern disk drives utilise a zone bit recording strategy, which 
in simple terms means maintaining constant bit densities. This 
leads to much higher data rates and capacities on the outer 
tracks of the drive, a typical ‘rate spread would be 24 to 48 M 
bits/ second inner to outer track. To maintain maximum 
flexibility and reduce rotational latency problems the data is 
configured as one field per spin on the drive, thus allowing 
fields to be accessed individually. The ability to pick one 
field at random permits exceptional vary speed performance to be 
achieved and of course allows the drive to be used in a non 
linear fashion. The principal of one field per spin implies a 
fixed length data track, as we have discussed above zone bit 
record disks have variable track lengths. The extra track 
lengths of the outer zones are effectively -hrown away, allowing 
a 60% capacity utilisation for a 1G Byte disk. | | 
Our current family of disk drives yield peak data rates of 50 M 
bytes/ second. 
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Raid machines would appear from first sight to be fairly simple 
to implement, 8 fairly modern SCSI drives (drive per bit?) 
yielding an effective record rate of 26M bits per second per 
drive. Unfortunately SCSI insulates the drive from the real 
world, it hides the data surfaces from the user behind 1M Byte of 


cache ram. SCSI drives also employ all kinds techniques like 
logical block addressing, sector sparing to further isolate the 
user from having any idea where the data resides on disk. Not 


knowing where the data is is not a problem for straight play 
modes, as long as the heads are always passing over wanted data 
the drive cache can reorder the data on the fly and give 
contiguous play. The problems start when the disk is asked to 
jump around in vary speed or non linear play modes, because the 
data is not in contiguous chunks the drive will pre-read data 
that is not required, the field after the out point of а поп 
linear clip for example, thus wasting time and reducing the off 
drive useful bit rate. The solution to all of the above problems 
is more bandwidth, to make a RAID with all of the features that 
we have come to expect a disk machine to have, excellent vary 
speed, non linear random access capability requires the machine 
to have one and half to three times raw video bandwidth. An 
excellent example of this is the Quantel Dylan disk machine which 
uses 20 SCSI drives to achieve play out. 


RAID implies redundant array, there are 6 levels of redundancy 
from zero to self rebuilding and parallel copies of data. 
Obviously no forms of redundancy come for free. With drive MTBF 
figures pushing 1 million hours small short duration raids used 
for temporary storage do not require data protection. When 
considering larger central storage systems and tape less play out 
devices the story is different. One hour of video consumes 90G 
Bytes, а 10 hour play out system using 5G drives with 1 million 
hour MTBF has a total MTBF of 200 days. Reed Solomon error 
correction is probably the most common and practical solution for 
systems of this size, permitting a drive to go down and allowing 
the check data on the other active drives to reconstruct the 
missing data stream until a new drive can be fitted. 


RAID machines are only feasible with fixed media i.e. winchester 
disks, imagine if you willa removable media MO RAID with its 
images spread over 20 removable disks. 


Compressed off line disks get round the problems described above 


through the use of video compression. The average non linear 
editor disk will be a standard 2G Byte high performance SCSI 
drive using 15 to 20:1 video compression ratios. If the 
comparison is made between the compressed video bit rate (10M 
bits) and the drive bandwidth (25 - 50M bits) the bandwidth 
overhead discussed above is obvious. 


Tapeless broadcast systems are in reality huge RAID machines 
configured for massive bandwidth coupled with long duration. As 
we have seen Long duration comes for free when trying to generate 
massive bandwidth in RAID architectures. Тһе principal Of 


operation is to provide multiple users with simultaneous access 
to a central huge store of video clips. The simultaneous access 
is the most taxing part of the system, again the only solution is 
more bandwidth. Some systems are now appearing on the market 
some work, some do nct. Compared to a tape archive system or a 
tape ‘jukebox’ syster disk storage has massive advantages, Lt 
so nas massive тог: and гето палпашепапст. 


Above shows that the two types of disk machine exist at present, 
a short duration cheap, wide bandwidth agile machine for post 
production and computer graphics type applications and much 
longer duration less agile streaming bulk storage devices. Raid 
machines cannot at present compete with parallel transfer devices 
on a cost per play basis, likewise Parallel transfer cannot 
compete on a price per minute basis. 


Future Media and Trends 


Winchester disk drives have and will be with us for a long time, 
where have they come from and where are they going? The first 
disk machine Abekas produced was the A62 it used two 8" Priam 
disk drives to achieve a video play and had head transfer rates 
of 10M bits/ sec. We then migrated on the A66 to single 5 1/4" 
drive with 15 heads and 15M bits/ sec head transfer rates. Our 
latest offering uses a single 3 1/2" drive with 15 data heads and 
head transfer rate of 28M bits/ sec. Obviously data rate 
increases will come, how far they will go is not easily 
predictable. I suspect we are reaching the point where the basic 
physics involved will not allow us to achieve more than a 10 
times increase in data rates. With head fly heights already 
measured in micro inches, how much better can we make the head/ 
platter interface? The main thrust of development seems to be in 
head technology, the use of magneto resistive heads as opposed to 
conventional wound "flying coil" types. The main difference 
between these being magneto resistive heads measure absolute flux 
as opposed to change in flux, allowing more consistent data rates 
to be achieved across the whole data surface. Faster spin speeds 
for drives allow shorter average access times, 5 years ago 
rotational speeds were at 60Hz, current drives are 90Hz and with 
the migration to smaller diameter platters  (2.5") we will see 
further speed increases. The disk manufacturers main thrust will 
pe toward increasing areal density, this allows them to reduce 
the number of platters for a given capacity, for example in 1992 
Seagate launched a state of the art 3 1/2" 1G Byte drive, it had 
eight platters, divided up as one servo surface and 15 data 
surfaces, last November IBM released 1G Byte 3 1/2" drive with 2 
platters, the servo was now embedded in the data surfaces and the 
drive has 4 data heads. The next leap will be a single platter 
1G Byte drive. We may see a larger proliferation of "two headed" 
drives, a very limited Parallel transfer, allowing 2x data rates. 
This trend may very well prove to be a blind alley, as track 
density increases the drive manufacturers ability to keep two 
heads on track consistently with temperature etc. may prove too 
difficult. 


Magneto Optical drives are relatively speaking still in there 
infancy, this is good news and probably means they have a way to 
go in available data rates. As shown above their performance 
will have to increase by a factor of 10 or so before they will be 
of serious interest to the real time transparent disk market. 
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There were some announcements a year ago about a new type of MO, 
the ETOM (Electron trapping optical memory) from OPTEX in the US, 
this uses different colour lasers to shift electron energy levels 
within the media, It is then possible to detect that they have an 
elevated energy state by exciting them during read with a 
different colour laser. The claimed bandwidth and storage 
capacity are huge, 14 G bytes storage and 120 M bits/ second from 
а 130 m.m. optical disk. This is still very experimental, and I 
will reserve judgement on the viability of the technique until we 
see product in the market place. 


Conclusion 


Real time digitally transparent cache recorders are here to stay. 
For the foreseeable future they will be Winchester disk based, 
and use raw uncompressed bandwidth to achieve their function. 
The sheer volume of MO drives or the massive compression ratios 
required will mean we will have to wait for Magneto Optical 
technology to catch up before we see it іп everyday real time 
use. We will see more and more hype about compressed disk 
systems over the next years, compression no doubt has a place in 
broadcast, but it has to be used with care and only where 
applicable. ‘Used in the wrong application high quality multi 
generation pictures will be a thing of the past. 
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David Bradshaw (BBC) 


DIGITAL TELEVISION CODING AND INTERFACE STANDARDS 
(CCIR RECOMMENDATIONS 601 & 656) 


D J Bradshaw" 


1. INTRODUCTION 


1.1 Digital Studios 


It has been recognised for many years that there would be significant advantages in the use of 
digital techniques for television signal processing and distribution. Even so, it has only 
recently become possible to consider complete digitisation of the television system from the camera 
output to the broadcast transmitter. Apart from the general progress in digital integrated circuit 
technology, this is. the result of advances made in the recording of digital signals on magnetic 


tape. 


For a considerable period, it was assumed that television signals would be digitised in composite 
PAL form and digital systems based on composite PAL have been developed for signal distribution 
from studios to transmitters. However, it was recognised that the use of digital PAL signals 
throughout the network would place many quality constraints on the operation of digital studios. 
In particular, several important studio operations require the signals to be decoded to component 
form in order to avoid the complication of processing modulated colour signals. Imperfections in 
the multiple PAL decode and recode operations involved would cause a rapid deterioration of picture 


quality. 


Operation with PAL, albeit in digital form, would not have improved the quality of picture seen by 
the viewer, except in respect of multi-generation recordings, and would have suffered all the well- 
known defects of PAL, principally cross-colour and cross-luminance, as well as the limited 
chrominance bandwidth. The difficulties of video-tape editing, due to the eight-field PAL sequence 
would remain also. 


An alternative approach is to work towards a system based on digital component signals, in which 
ultimately the signals would remain in component form throughout, only being encoded as PAL signals 
for broadcast transmission. Component Coding provides greater freedom in signal processing and 
recording, offers potentially greater chrominance bandwidth and eliminates the problems caused by 
PAL subcarrier in editing operations and ‘footprints’ caused by the PAL coding process. Breaking 
away from the composite signal formats of PAL and SECAM in Europe and NTSC elsewhere offered the 
possibility of a compatible world-wide standard for both 625 and 525-line countries. With this 
approach, there would be a changeover period in which some PAL decoding and recoding would still be 
required, but as the conversion to digital components operation progressed, the decoders and 


recoders would gradually be removed. 


With this approach in view, the CCIR has defined the main parameters of a digital component signal 
coding standard for use in television production facilities (Recommendation 601). Subsequently, 
more detailed specifications based on the same parameters, but using bit-parallel and bit-serial 
multiplexed component signals have been established’ to provide for digital interconnections. 


* Design & Equipment Department, British Broadcasting Corporation 


1.2 EXPERIMENTAL EQUIPMENT 


During the derivation of the digital standard, experimental equipment was developed to investigate 
both the basic quality of potential standards and their suitability for signal processing. The 
development occurred primarily in two phases, each culminating in а series of demonstrations to 
committees of the EBU at which the results were assessed. | 


Preliminary discussions in the EBU were constrained by the need to minimise the total data rate o. 
a components standard. Besides the general disadvantage of needing higher-speed logic for greater 
data rates, there were two specific limitations. First, for transmission over 140 Mbit/s digital 
telecommunications links it was thought to be undesirable to use bit-rate reduction methods that 
would affect picture quality; that is, methods other than removal of the blanking periods. The 
140 Mbits/s capacity therefore corresponded to a gross data rate (including blanking) of about 
160 Mbits/s. The second constraint was that, at the time, a similar Figure was accepted as a 
practical limit for the gross capacity of digital video recorders. As a result, a system with a 
gross data rate of 160 Mbits was chosen from a number of competing systems for further assessment 
by the EBU. This used 12 MHz sampling of luminance and 4 MHz sampling for each of the two colour 
difference signals and consequently became known as the 12:4:4 system. Equipment working on 12:4:4 
sampling was evaluated at demonstrations in April 1980. 


At these demonstrations, although the basic picture quality achieved by 12:4:4 sampling was judged 
to be acceptable, the system was found to have deficiencies for signal processing in two important 
respects: first, the chrominance bandwidth allowed with 4MHz sampling was found to be inadequate 
for high quality colour-matting (chromakey), and, secondly, the 12MHz luminance sampling frequency 
was found to be too close to the Nyquist sampling limit (11MHz) to allow simple luminance 
filtering. 


Subsequently, to overcome the colour-matte problem, the capacity of the colour-difference channels 


was increased by using sampling frequencies in the ratios 4:2:2 and, to increase the sampling 


headroom available for interpolation, luminance sampling frequencies of 13, 135 and 14MHz мег 
included in addition to 12MHz. These increases were made possible by developments in digital video 
recorders which had resulted in increased capacity and by the acceptance that suitably transparent 
methods of bit-rate reduction were now available for use on transmission links. The systems were 
compared at a second set of demonstrations held in January 1981. From these tests, and in 
consultation with the SMPTE, the 13.5/6.75/6.75 MHz system of sampling was chosen for use in studio 
equipment. 


The description of component signal coding systems contained in these notes has two main parts. 
The first covers digital component signal coding in general terms and, where necessary, interprets 
the modes of operation implied by the standard. The second part describes the interface standards 
used by digital video equipment. 


2. DIGITAL COMPONENT SIGNAL CODING 


2.1 STANDARD HEIRARCHY 


As the digital interface standard”, based on 4:22 sampling, is to be used to connect together 
major items of digital studio equipment, the 4:2:2 sampling parameters are fundamental to any 
digital studio system. Even so, CCIR Recommendation 601 envisages the use of other standards, 
provided that conversion to 4:2:2 sampling is straightforward. Іп practice, this restricts Ше 


ань 


j 
/ 


Sos A 


` 
À 
n 


choice to sampling frequencies with simple relationships to the 4:2:2 frequencies. Thus, 422 зай 


sampling could be used as one of a family or hierarchy of compatible coding standards, such as that 
shown in Table 1. 


- 


LUMINANCE 


CHROMINANCE 


Table 1 - a heirarchy of compatible coding standards 


The lower standards of Table 1, that is 4:1:1, 3:1:1 and 2:1:1, reduce picture quality noticeably. 
Because of this, their use is limited to applications in which the reduced data rates are of 
paramount importance, such as in portable ENG equipment or to form part of a bit-rate reduction 
system. | 


At present, however, there seems to be little interest in digital component standards lower than 
4:2:2. In contrast, 4:4:4 sampling is expected to be of considerable importance in some signal 
processing applications, primarily for RGB but also, probably, for luminance and colour difference 
signals (Y:Cr:Cb). In the context of conversions between analogue RGB and the 4:2:2 Y:Cr:Cb 
digital standard, 4:4:4 signals form a convenient step in the conversion process. 


Because of the simple fixed ratios between the sampling rates in Table 1, conversions between them 
can be achieved with relatively simple fixed-value interpolators. Apart from the obvious loss of 
bandwidth which must occur when initially changing to a lower sampling frequency, other conversion 
impairments, although tending to be cumulative, are minor. Repeated conversions are unlikely to be 
a frequent occurrence in a normal studio environment. | 


22 SAMPLING STRUCTURE 


The digital coding standards of Recommendation 601 are based on line-locked sampling. This 
produces an orthogonal sampling grid in which samples on the current line fall directly beneath 
those on previous lines and fields, and exactly overlay samples on the previous picture, as shown 
in Figure 1. This orthogonal structure has many advantages for signal processing, including the 
simplifications of filters and repetitive control waveforms. 


In addition to being locked in frequency, the sampling is locked in phase, one sample being 
coincident with the line-time reference point (the half-amplitude point of the falling edge of the 
line synchronising pulse). This ensures that different sources produce samples nominally at the 
same positions in the picture. By making this feature common to all members of the sampling 
hierarchy (Table 1), many of the samples from different levels of the hierarchy become co-sited, 
further simplifying conversions within the hierarchy. | 


Setting the sampling frequency at 13.5MHz provides a measure of commonality between 625/50 and 
525/60 systems because this frequency has the property of being an exact harmonic of the line rate 
on both scanning standards (864 and 858 times line frequency for 625- and 525-line systems 
respectively). The different formats of digital line for the two standards are shown in Figure 2. 


Each digital line consists of a blanking period followed by an active line period. To provide 
further commonality, the active line period is 720 samples, on both standards. This is sufficient 
to accommodate the analogue active line period on either standard with enough extra capacity to 
include the analogue blanking edges over the full spread of timing tolerances. This prevents any 


possibility of digital blanking cropping the active line period of the signal. Analogue blanking 
should be applied once only, either at the source or, preferably, in monitors and at the conversion 
to composite signal format for broadcast transmission. This avoids the extension of blanking 
periods and the softening of blanking edges which occurs in analogue systems with the repeated 
application of blanking. 


2.3 LUMINANCE AND COLOUR-DIFFERENCE FILTERING 


At some stages during the development of digital studio systems it may be necessary to ma 
multiple conversions of component signals, to and from digital form. Therefore, the filters usea 
in these conversions require a tight specification to ensure that signal quality can be maintained. 
This is ensured by using filter characteristics which have a flat response in the passband (Fp), a 
transition band which attenuates the half sampling frequency (0.5 Fs) region and a stopband which 
suppresses frequencies above Fs-Fp. In addition, the sample-and-hold action of the digital-to- 
analogue conversion process contributes to the overall filtering effect and needs to be taken into 
account. 


Luminance signals, and 4:4:4-level wideband signals where used, are subject to the sampling and 
filtering processes shown in Figure 3(a). The low-pass filters associated with the а-а and а-а 
converters both require passbands which extend to 5.5MHz and contribute minimal group delay 
distortion. A rapid rate of cut is needed in the transition band to reduce alias components in the 
5.5 to 8MHz region, but, if the cut is too sharp, satisfactory passband performance is much more 
difficult to attain. For 13.5МН2 sampling, an attenuation of 12dB at 6.75MHz has been found to be 
a suitable compromise. 


In the stopband region, however, there are slight differences in the action of the a-d and d-a 
filters. The filter preceding the a-d converter has to suppress high frequencies in the input 
which would otherwise be aliased by the sampling process to fall within the passband. For normal 
signals these components are of low amplitude, so that а stopband attenuation of 40dB is 


satisfactory and is reasonably easy to obtain. In comparison, there is much more energy to ` 


suppress in the digital (sampled) signal, as the baseband signal is replicated at full amplitv 
around each harmonic of the sampling frequency. Fortunately, in the d-a conversion, the sampi 
and-hold action contributes а sinx/x frequency characteristic with nulls at 13.5MHz and its 
harmonics. This assists the stopband action of the low-pass filter by attenuating the main alias 
components, with the result that an a-d filter is sufficient. A further consequence of the sample- 
and-hold is a noticeable reduction of high passband frequencies. This is corrected by including an 
equaliser which has the inverse characteristic over the passband region, thus producing а flat 
passband response overall. 


The requirements for colour difference filters іп 4:2:2 signal conversions аге, in general, a 


frequency-scaled version of those for the luminance signal. Thus the edge of the passband is set 


at 2.75MHz and the stopband extends from 4MHz. There are, however, two factors which lead to minor 
differences іп the specification. One factor is that, for the colour difference signal 
conversions, digital filters can be used as an alternative and the other is that the levels of 
impairment introduced by the filters are different in the luminance and colour difference signals. 


The colour difference signals can be converted directly between analogue form and 6.75MHz digital 
samples as shown in Figure 3(b). Apart from the scaling of frequencies by 2:1, this process is 
similar to that described for luminance, shown in Figure 3(a). 


A problem with this approach is that scaling the frequency characteristics doubles the delay in the 
colour difference conversion processes. A compensating padding delay is therefore required in eac* 
of the luminance conversions. An alternative approach is to convert the R, G and B signals. 


13.5MHz digital samples and to derive 4:2:2 Y, Cr, Cb signals from them. This avoids the problems 


of analogue delay-matching associated with converting directly at the 422 sampling frequencies 
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and has the stability benefits of digital techniques for the matrix and filter operations. The 
availability of integrated circuits set to perform this function would lead to its greater use. 


The sampling and filtering processes involved when the a-d and d-a conversions are made using RGB 
signals are shown in Figure 3(c). In addition to the processes shown in Figure 3(a), this includes 
a digital low-pass filter and sampler for conversion to the 6.75MHz sample rate and an 
interpolator, shown conceptually as a low-pass filter and sampler, for reproducing 13.5MHz values. 
The necessary matrixing processes are omitted from this diagram for clarity. The complexity of 
these digital filters can be reduced considerably by choosing a low-pass characteristic which is 
skew-symmetric about a value of 0.5 at 3.375MHz. This virtually halves the number of non-zero 
values in the impulse response and allows a single filter to be used to process both colour- 
difference signals in multiplexed form. To allow these filtering methods to be used, the 
specification for colour-difference filtering therefore includes provision for a minimum of 6dB 
attenuation at the half sampling frequency, instead of the 12dB specified for luminance. As the 
conversion from 6.75MHz to 13.5MHz includes no 6.75MHz sample-and-hold, the stopband attenuation of 
this interpolation process needs to rise to 55dB in the 6.75MHz region to achieve similar 
performance to that of Figure 3(b). This level of attenuation is easy to achieve with a digital 
filter. x 


Because of the 2:1 relationship in the specifications for luminance and chrominance filters, the 
ringing introduced on sharp transitions due to band limiting will be similar in amplitude, but the 
dominant frequency of the rings will be twice as great. While the luminance ringing is barely 
visible at normal viewing distances, the chrominance rings are an obvious impairment. It is 
possible to remove the rings by introducing ап additional filter having а slow roll-off 
characteristic at the final conversion to analogue form. In many cases the final conversion would 
take place in a composite PAL coder which would necessarily contain a slow roll-off filter so 
ensuring that any aliasing remaining in the transition band of the sharp-cut filters would be 
further attenuated. The use of slow roll-off filters other than at the final conversion from 4:22 
would reduce the bandwidth available in the signal chain for chrominance processing such as chroma- 
key’, and if cascaded would cause a serious loss of resolution. 


2.4 CODING RANGES 


In Recommendation 601, the signal coding ranges are defined in terms of uniformly quantised 
p.c.m., using 8 bits per sample. The 8-bit codes representing the signal values therefore range 
from 0 to 255 in decimal form or 00 to FF in hexadecimal notation. The codes have been chosen so 
that a standard level signal substantially fills the available range. This minimises quantisation 
distortion while providing a few levels at each end of the coding range to accommodate over-size 


signals. 


For a luminance signal, shown in Figure 4(a), black corresponds to level 16 and white to level 235, 
a range of 219 quantisation steps. This leaves a slightly greater headroom at the white end of the 
coding range. This is consistent with clipping at black and recognises that, while black level is 
reasonably easily controlled by clamping, white level is more difficult to control. 


The colour-difference signals range symmetrically from level 16 to 240 with zero signal 
corresponding to level 128, Figure 4(b). This method of numbering the levels constitutes a form of 
coding known as "offset binary". This allows both positive and negative values to be represented 
by positive numbers. While offset binary representation is easy to visualise, amounting to just a 
d.c. shift in the bipolar video colour-difference signal, digital processing using this code is 
generally not the most efficient and ^onversion to twos complement form prior to processing is 
normal. 


Coding levels 0 and 255 (00 and FF) in both the luminance and colour difference signals are 
reserved for synchronising information, so that only levels 1 to 254 (01 to FE) are available for 
video signals. 


In 4:4:4 coding systems, there is the possibility of using RGB as an alternative to luminance and 
colour difference signals. The coding ranges used for the R, G and B signals are identical to 
those specified for luminance in Figure 4(a) as this simplifies matrixing between the two forms. 


All the features of the signal coding ranges for 4:2:2 and 4:4:4 signals are identical on both the 
525/50 and 625/50 scanning standards. 


3. DIGITAL VIDEO INTERFACES vat 


3.1 DIGITAL SIGNAL PROCESSING AND INTERFACE WORD SIZE 


One of the consequences of signal processing is that additional bits are generated. The addition 
of two eight-bit numbers produces a nine-bit result, while multiplying two eight-bit numbers 
together results in a 16-bit result. It is not practical to preserve all the bits resulting from 
signal processing and means have to be employed to reduce the word size to a manageable size before 
the digital video signal is passed from one equipment to another through an interface. Simply 
truncating or rounding to eight bits will usually produce visible artefacts (contouring) on picture 
material such as shaded backgrounds and more sophisticated techniques are required. 


Within the EBU it has been maintained that effective means exist (such as error-feedback or 
Quantel’s dynamic rounding) for the reduction of word size and that eight-bit resolution is 
adequate at the interface. In the SMPTE, however, it is held that ten bits are required, despite 
the extra cost, and they have defined their interfaces in ten-bit terms. Inevitably, manufacturers 
of distribution equipment have made provision for handling ten-bit signals, even though the all- 
important digital recorder can carry only eight bit data. 


In the following descriptions of the interfaces, ten-bit word sizes have been assumed, except where 
stated otherwise. 


3.2 OPTIONS 


CCIR Recommendation 601 provides a framework for the generation of high-quality digital video 
component signals. In order to preserve the quality that Rec. 601 permits, the signals should 
remain in digital form when they are transferred from one equipment or area to another. To permit 
equipments to be interconnected, internationally-agreed specifications have been produced, setting 
out the manner in which digital video studio equipments should be interfaced to each other; these 


are contained in CCIR Recommendation 656. 


The total data rate required for the three component signals is 27 Mword/s, that is 270 Mbit/s at 
ten-bit resoltion. | 


Most, if not all, equipment in a digital television studio will process digital video signals in a 
bit-parallel format for the foreseeable future. Conversion from parallel format to а bit-serial 
format for the interconnection adds complexity and cost, and an interface is clearly required that 
is capable of carrying the signal in a bit-parallel form over the relatively short distances found 
within a television studio. For longer, inter-studio connections, a serial-format link is 
required, to take advantage of the lower cost of coaxial cable over multi-pair cable, to permit the 
use of optical fibre cables and to simplify the design of routing matrices. 


There are at least six ways of carrying the data, based on bit-parallel and bit-serial versions of 
the following: 


Е three separate bearers for the three component signals; 
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- two bearers, each operating at 13.5 Mword/s, one for the luminance signal, Y, and the 


other for a multiplex of the two colour-difference signals. 


- a single bearer operating at 27 Mword/s and carrying a multiplex of the luminance and 
colour-difference signals. 


The use of several bearers for the individual component signals is operationally inconvenient, 
raises problems of maintaining the correct timing relationship between the signals and is a less 
cost-effective solution than the single bearer (increased cable, connector and termination costs). 


Specifications have been produced for interfaces іп both bit-parallel”® and bit-serial” 7% 
formats, based on single cables carrying a multiplex of digital component signals. 


3.3 THE BIT-PARALLEL INTERFACE 
The parallel interface can be considered under the following headings: 


Signal format 
Synchronisation 
electrical parameters 
cables and connectors 


3.3.1 Signal Format 


CCIR Recommendation 601 states that there shall be 720 luminance samples and 360 samples of each 
colour-difference signal on each active line, i.e. a total of 1440 samples on each active line, and 
this is the same for both 525- and 625-line signals. There are 288 luminance and chrominance 
samples on each line outside the active line, ie. in digital blanking (for 525-line signals there 
are 272 luminance and chrominance samples in the blanking period for each line). 


In order to create the multiplex of samples from the luminance and chrominance signals, data words 
are interleaved as shown in Figure 5. Because of the orthogonal sampling structure, the sequence 
is repeated on each line. Thus there is a repeated sequence of groups of four words on each line. 


..><[Cr:Y:Cb] Y><.. 


Where [Cr:Y:Cb] represents the co-sited samples and Y, the final luminance sample of the four-word 
sequence, has no co-sited chrominance samples. 


3.3.2 Synchronisation 


Having multiplexed the component signals into a single data-stream, means must be provided to 
permit the accurate demultiplexing of the data into the individual component signals. This is 
achieved through the use of synchronising signals embedded in the multiplex signal. 


These synchronising signals are necessary also to permit digital video signals from different areas 
to be synchronised and to enable the active portion of each line to be separated from the blanked 
portion. They also have a role to play in the serial interface, as described below. In addition, 
they facilitate the regeneration of an analogue synchronising waveform at the point where the 
digital component signals are converted to analogue form. 


In order to achieve a high degree of commonality between the 625- and 525-line specifications, the 
synchronising signals - known as timing reference signals (TRS) - are placed at the beginning and 
end of the active line data; that is, there are 1440 samples of video data between them in both 
525- and 625-line systems (the systems differ only in the number of samples in blanking). 


A number of criteria can be established for the synchronising signals. They must be unique so that 
video or other data is not mistaken for them, they must be easily detected and they must be robust 
- ideally, capable of reliable detection even in the presence of errors. 


The multiplex signal contains a basic four-word sequence as described above, and the timing 
reference signals are also based оп a sequence of four words: 


FF 00 00 XY (hexadecimal notation) pe 
The first three words from a unique fixed preamble through the use of the data words FF and tu | 
which are reserved for "housekeeping" purposes and are excluded from the video data coding range. 
The final word, XY, is made up as shown in Figure 6; it contains three bits, F, V and H, the 
polarity of which indicate odd/even field, field blanking on/off and line blanking on/off 
respectively. The point at which F, V and H are all set to 0 is the beginning of the active 
picture on the first field of the two-field sequence. The four least significant bits of the final 
word are protection bits to enable the F, V and H bits to be decoded even in the presence of a 
single-bit error, so providing robustness. The protection is based on Hamming coding. (The two 
remaining bits are set to 0.) 


The way in which the F, V and H bits change state during field blanking is illustrated in Figure 7 
and the timing relationship between the timing reference signals and analogue synchronising signals 
is shown in Figure 8. 


It should be noted that, unlike the case for an analogue video signal, the signal at the digital 
video interface contains по half-lines and, consequently, nothing corresponding to equalising 
pulses. All lines are of the full length, to facilitate signal processing and to maintain 
commonality between the 625- and 525-line systems. 


The timing reference signal which occurs immediately prior to the start of the active line is also 
known as the SAV (Start of Active Video) signal and the one which follows the active video data as 
the EAV (End of Active Video) signal, particularly in North America. ) 


3.3.3 · Electrical Format 
Desirable features of the line sending and receiving arrangements include: 


- the use of low-cost, general purpose integrated circuits; 
- simple circuitry in senders and receivers; 

- capability to operate with long lengths of cable; 

- freedom from adjustment for length of cable; 

- freedom from crosstalk. 


The last requirement can be largely met by the use of balanced transmission over terminated cables, 
while the first and second requirements favour the use of the standard ECL 10000 series devices as 
these operate reliably at a 27MHz clock frequency and have balanced inputs and outputs. Over short 
distances, say up to 30 metres, simple senders and receivers can be made from TTL-to-ECL and ECL- 
to-TTL translators respectively. 


A benefit of having the signal in digital form is that the accuracy of equalisation required to 
extend the range beyond a few tens of metres is much less critical than is the case for analogue 
transmission, and the same fixed equaliser will operate on links of very different lengths. An 
equaliser characteristic has been established which has been found to permit operation on sper 
cables up to 200 m in length and this is shown in Figure 9. An implementation of the equalise  / 
shown in Figure 10: note that this circuit incorporates the 110-ohm termination for the cable. (іп 
early work the equaliser was fitted permanently and was found to allow operation from zero to 200m. 
However, some users found that the equaliser worsened the effect of crosstalk in an environment 
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where there are many relatively short links in use Consequently, it is now more usual for the 
equaliser to be optional, being fitted when the link exceeds about 30m in length.) 


In the interests of simple sending and receiving circuitry, the data multiplex is transmitted in 
NRZ form and is not inherently self-clocking. Therefore a clock sigral at 27 MHz is transmitted 
along with the data, on a ninth signal pair. 


Whilst the use of 27MHz represents the most straightforward approach, it requires the link to carry 
a signal that is much higher in frequency than the spectrum of the data signals. Experiments 
showed that while it was possible to transmit a clock signal over greater distances by using a 
lower clock frequency (e.g. 6.75MHz or 13.5MHz), the gain was negated by the need for additional 
clock regeneration circuitry such as a phase-locked loop oscillator on every interface. Since the 
majority of links in a studio are less than thirty metres, well within the capability of simple 
receivers, the additional complexity could not be justified. 


The clock transitions are specified as occurring mid-way between data transitions. This was chosen 
to reduce the sensitivity of the clock edge to jitter produced by crosstalk from the video data and 
to differences in path lengths between the clock and the data signals. 


3.3.4 · Cables and Connectors 


The quality of cable used for a digital video link can be selected according to the length of the 
link. The most important feature of the cable is that all the signal pairs should have identical 
path lengths and this has to be taken into account in the design of the cable if unacceptable 
timing differences are not to occur. 


Digital apparatus generates a significant amount of energy at the clock frequency of 13.5МН2 and 
its harmonics, including 1215МН2, an international aeronautical distress frequency, so cables 
should be screened and the screening continued through the connector to the apparatus. 


Suitable cables have been produced for links up to 200 metres. These tend to be slightly larger 
than conventional studio-quality coaxial cable. For short links of up to about 10 metres, 
relatively inexpensive general-purpose screened multipair cable can be used. Although it can be 
used for very short links, the use of flat ribbon cable should be limited to within screened 
enclosures as it is not itself screened. 


The choice of connector for the interface is a compromise, attempting to meet the following, 
conflicting requirements: 


- ease of assembly and repair 

- small size 

- rugged when mated 

- high minimum rated insertions and withdrawals 
- multiple sourcing and world-wide availability 

- low cost 


The connector specified, the subminiature type D, has the advantages of being widely used for other 
applications, well accepted and available in a variety of contact forms and versions. The contacts 
themselves, when gold plated, offer adequate reliability. The assignment of signals to pins in the 
connector is shown in Figure 11. Note that an 8-bit signal will occupy contacts D9 (MSB) to D2. 


A locking mechanism is essential and, initially, the slide-lock was selected as it occupies little 
panel space and requires no tools for its operation. However, it proved to be less robust than 
required and somewhat expensive and it has been superceded by the ubiquitous screw-lock mechanism 
with UNC 4-40 threads. 


Whilst the type-D connector is suitable as an installation connector, it is not designed for 
jackfield-type applications, for which a different style of connector is required. Мо connector 
has been standardised for this application as yet. 


3.3.5 Experience with the Parallel Interface 

P4 
At the time of writing, a large number of organisations have installed digital video equipment 
interconnected by means of the parallel interface. The major problem experienced has been due to — 
the size of the type-D connector, which limits the number of connectors that can be mounted оп:  ; 
panel, and the large number of signals that have to pass between the interfaces and electronics 
modules, a particular problem in the case of studio routing matrices. 


Clock regeneration has not proved necessary, except at the very limit of long links, where signal 
regenerators have been fitted to re-establish the correct clock-to-data timing. 


3.4 THE BIT-SERIAL INTERFACE 


Whilst the majority of signal processing operations within a studio require access to the signal in 
bit-parallel form, for which the bit-parallel interface is intended, there is a requirement for a 
 bit-serial interface for use on long inter-area links. This would permit transmission over a 
single bearer such as coaxial or optical-fibre cable and would avoid problems caused by clock-to- 
data timing skew that can occur on long multi-pair cables. 


Since the bit-parallel and bit-serial interfaces have to co-exist within a studio complex, one of 
the basic requirements is that the serial format should be derived directly from the parallel 
format, to allow easy transformation from one format on the other. 


3.4.1: Coding Strategy 

Probably the most important aspect of the serial interface is the coding strategy to be employed 2 ^ 
it is not possible to transmit a directly-serialised version of the parallel-format signal over an, 2 
significant distance. It is necessary to code the signal in some way for transmission. A number 

of criteria for the serial signal need to be met, including: | 


- The serial format should be derived from the parallel format. (As a consequence the 
presence of timing reference signals in the data stream can be assumed.) 


- The code must be suitable for transmission on both coaxial and optical fibre cables. This 
implies that multi-level codes would be unacceptable. i 


- The coded signal should have a very small low-frequency content (to allow AC coupling, to 
simplify cable equalisation and permit use of optical fibre cables.) 


- The code should require- limited overhead (ideally, zero) in order to restrict the signal 
bandwidth and clocking rate of the high-speed circuitry, to simplify equalisation and 
maximise the transmission distance. 


- The coding system should be simple to implement. 


The latter requirement, simplicity of implementation is particularly important in the context of 
television production centres, where a large number of links are employed. 


А 


In addition, there are a number of practical requirements that the coding strategy must meet: 


2 Тһе coding system must be transparent; all combinations of data words must be transmitted 
faithfully. 
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- Adequate timing information must be contained in the serial signal to permit reliable and 
straightforward extraction of a clock signal in the receiver. 


- Error detection, although not considered essential, would provide a means of detecting 
link or equipment malfunction. 


: Single-bit errors in the serial signal should not result in extension of the error to more 
than one word in the receiver. 


Assessed against these criteria, the coding methods of scrambling and block encoding appeared 
suitable. In the block encoding method, each of the 256 eight-bit data words is mapped to a larger 
word having good transmission characteristics: low DC component (achieved by having as near as 
possible an equal number of 1s and 0s) and a large number of transitions to ease clock regeneration 
(through minimising the number of adjacent bits of the same polarity). This method allows good 
control of the low-frequency energy and permits simple equalisation and operation over optical- 
fibre cables. Scrambling, on the other hand, results in a simpler implementation but incurs the 
risk of error extension in the descrambler following a single bit error on the link and has poorer 
control over the low-frequency energy. 


The block encoding method was selected, based on mapping the 8-bit data words to selected 9-bit 
transmission words to minimise the bit rate and so maximise the transmission distance. An 
interface specification was agreed by the EBU and by CCIR, and is included in CCIR Rec. 656. 
However, integrated circuits for this interface were slow in appearing and an alternative 10-bit 
interface was proposed by the Sony Corporation” based on the scrambling technique and potentially 
suitable for both digital component and digital composite interfaces. This proposal has received 
the support of SMPTE and, as a result of the influence of Sony in the broadcast market, has 
superceded the "official" serial interface as a de facto standard. The EBU has recognised this 
situation and has withdrawn its support of the block-encoding interface: therefore, no description 
of this interface is included here. Interested readers should refer to reference 9 for a 
description if required: Unfortunately the scrambling interface permits operation with ten-bit 
data words and its adoption by SMPTE has re-opened the 8-bit/10-bit argument. 


3.4.2 The Scrambling Interface 


The sending interface accepts the parallel interface signal, serialises it and passes it through a 
scrambler before being sent to line. At the receiving end, the signal passes through a descrambler 
and deserialiser to reappear in parallel form. 


Scrambling has the effect of breaking up long runs of 15 and 05 and produces a power spectrum 
approximating to random noise. In mathematical terms, scrambling is equivalent to multiplying the 
signal by a polynomial, G(x). In the case of the interface, the polynomial is the product of two 
polynomials G1(x).G2(x): 


Gl(x) = (1 + x! +x’) and G2(x) = (1 + x) 


It is perhaps easier to understand from the block diagram, shown in Figure 12. It can be seen to 
be analagous to clocking the serialised signal through a shift register which has feedback applied 
from various stages to the input via exclusive-OR gates. 


The first polynomial has the effect of scrambling the data-stream. The action of the second 
polynomial is to cause each logic 1 appearing at the output of the first polynomial to be converted 
to a transition from either state at the output of G2(x) while no transition occurs for a logic 0 
output from С1(х). Thus the interface transmits changes of states rather than 15 ог 05, so making 
the interface independent of polarity. - 


One of the potential problems with a scrambler is that it is possible to devise an input which will 
result in a long run of Os at the output. In the case of this interface the maximum theoretical 
run of 06 is 38. However, because of the limitation on the use of data words 00 and FF, the 
practical limit of consecutive 08 becomes 25. This represents a substantial low-frequency 
component and it remains to be seen if this will be a problem. The only integrated desrialiser 
available at present incorporates an equaliser which is claimed to cope with this potential 
limitation. 

Descrambling is the reverse of scrambling and the block diagram of the descrambler is shown 1 
Figure 13. There аге ten stages involved in the descrambler, so any single-bit error in the 
incoming serial data will affect the following ten bits. This means that up to two ten-bit words 
(but no more) will be affected by a single bit error. 


Having descrambled the data, it remains to be deserialised and for this to be achieved correctly 
the start of each data word has to be identified. This operation, word synchronisation, makes use 
of unique, known, regularly-occurring information contained in the data-stream: the timing 
reference signals, TRS, with their unique FF 0000 preamble. If detection of the preamble can be 
accomplished in the serial domain prior to deserialisation, the correct deserialising phase can be 
achieved on receipt of the first TRS, i.e. within one television line. - 


3.4.3 Electrical Format 


The serial sender is specified as having an output signal amplitude of 700mV peak-to-peak, 
compatible with the output of emitter-coupled logic devices. It is intended that a 75-ohm coaxial 
cable shall be the normal signal bearer, so a 75-ohm output impedance is specified. 


3.4.4 Mechanical 


The widely-used BNC connector is specified for the serial interface. Because of its greater 
ruggedness, the 50-ohm version of the connector is used. | У 


3.4.5 Experience with the bit-serial interface 


At the time of writing, the use of the scrambling serial interface has been limited, due to the 
lack of availablity of the essential integrated circuits. This situation has now changed and a 
variety of equipment is becoming available with the interface fitted, including all-serial routing 
matrices. Transmission over distances of 300 meters using studio-quality coaxial cable and 500 
metres with CATV cable has been demonstrated. Operation with optical fibres has been demonstrated 
also, though it is anticipated that it will be several years before these become economically 
attractive for studio use when compared with coaxial cable. 


3.5 ANCILLARY DATA CHANNEL 


It is common practice in analogue television studios to make use of the lines which occur during 
field blanking for carrying ancillary signals. These signals include test-signals, such as 
insertion test signals used for checking the performance of circuits, and time code signals which 
enable each television frame of a recorded signal to be identified for editing purposes. In 
addition, the synchronising pulse itself is used to carry a digitised sound signal over the links 
between studios and transmitters. 


The digital video interface provides a very large capacity for carrying ancillary signals іп br”. 
the digital field- and line-blanking periods (1720 bytes per line). Work is still in progress | / 
define a standardised way in which ancillary signals are to be handled. | 
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RTS LECTURE 
UNDERSTANDING DIGITAL TELEVISION 


DIGITAL VIDEO PROCESSING IN POST PRODUCTION 
SPEAKER: NEIL HINSON, CHIEF ENGINEER, QUANTEL 


TONIGHT’S LECTURE 

I am here to talk about Post Production in your series of lectures on Understanding 
Digital Television. Post Production covers an enormous range of topics - far more 
than I can cover tonight - so I plan to be fairly narrow in what I talk about. 


WHAT IS POST PRODUCTION? 

Post Production is the name given to the composition of existing video and the 
creation of new video to produce a finished video “product” - television commercial, 
sting, promo, drama production, documentary or whatever. 


WHY SHOULD VIDEO PROCESSING BE NECESSARY? 

For two reasons: | 
To correct for errors during the original shoot - “don’t worry, we’ll fix it in Post 
To deliberately create images which never existed in reality. 


? | 


OPERATIONS IN POST PRODUCTION 

1. PLAYBACK & RECORDING 

2. EDITING 

3. MIXING, KEYING & LAYERING | 

4, SYNTHESIS - Paint systems, character generators and computer-generated 3D 
5а. SPATIAL MANIPULATION - Flying pictures 

5b. AMPLITUDE MANIPULATION - Colour correction. 


· Because Post Production is such I wide area I will concentrate on the basics of the part 


which is unique to digital television and which hasn’t or won’t be covered by other 
lectures - television digital video processing. I will cover the basics of mixing and 
keying and a little about colour correction because they allow me to introduce some of 
the basic building blocks of digital video processing. And I will then use those building 
blocks to describe how simple 3D spatial manipulation (i.e. flying pictures with 
DVEs) works. 


I won't be talking about playback, recording or cut-editing, because they are more 
about control than processing, and I will not touch synthesis because I don't have the 
time. 
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WHAT I WANT TO ACHIEVE WITH THIS LECTURE 
I am aiming this lecture at two groups of people: 

1. Those who know very little about digital television (but may be familiar with 
the workings of analogue television) and who want to get some understanding 
of how the things they see on television every day are accomplished. 

2. Those who already work in digital television, perhaps as Test Engineers or 
Service Engineers, who already know how digital Post Production machines 
work and now want to know why they work as they do - and perhaps why 
they aren’t better than they are! 


I am not aiming at Design Engineers - indeed I will be trying very hard not to give 
away too many trade secrets. 


What I want to convey is a broad understanding of how digital mixers and keyers, 
basic colour correctors and 3D effects machines work. I also want to illustrate the 
range of options and compromises which face the design engineer and so give some 
insight into why the final choices are made. The trade off is inevitably between design 
time, cost, quality and processing speed. 


Earlier lecturers in this series of RTS lectures will have told you that the big attraction 
of digits is that once you have digitised an analogue video signal it stays perfect no 
matter how many times you pass it around. One of the things which I want to point 
out is that when it comes to digital video processing this is most definitely not true. 
Almost any digital video process degrades the video; the art is in minimising that 
degradation. 


CONTROL vs. PROCESS 

Superficially, Post Production equipment from the various manufacturers appears very 
different - even when it is doing much the same job. From dedicated “application 
specific” boxes to general purpose “standard platforms” and everything in between, 
controlled by knobs, buttons, fader-arms, keyboards, joysticks, tracker balls, mice, and 
pens. But underneath they all have a common feature - they all carry out mathematical 
operations on the digital values of video pixels. I do not intend to talk about the way a 
particular box is controlled in order to do its processing, but I will be talking about the 
processing itself. I do not even care whether the box contains dedicated hardware or is 
just a computer - to achieve a given result it must execute much the same mathematics. 


A DISCLAIMER 

Finally a disclaimer. Some of what I tell you tonight may well be subject to patents. If 
you are a design engineer - and as I said, this lecture is not aimed at designers - then be 
aware of this. 


COLOUR CORRECTION 


Not so much colour correction as brightness and contrast adjustment, because they let 
me introduce two of the basic building blocks of digital video processing. Adders and 
multipliers. 
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COLOUR SPACE RECAP. 

[FIG 1 - The RGB Colour Cube] 

First a recap on colour space representation, which applies to both analogue and digital 
television. The colour of any point on the television screen needs three values to 
describe it - hence the term “colour space” because of the analogy with physical space 
and its three dimensions. Computers tend to use red, green and blue, and the colour 
of a point on the TV screen can be described by the amount of red, green and blue 
present. This leads onto the concept of the Colour Cube which represents the three- 
dimensional colour space which contains all possible RGB colours. 


Notice that the eight corners of the Colour Cube are the colours of colour-bars. 
Notice also that the principle diagonal is the luminance axis. All points on the 
luminance axis are monochrome. 


[FIG 2 - Bottom view of the RGB Colour Cube] 

Broadcast television normally uses /uminance and colour difference signals. These are 
simply a different set of axes which provide a different way of describing the same 
colour. Luminance (Y) describes the brightness of the point on the screen, and on its 
own would produce a monochrome (black and white) picture. The two colour 
difference signals B-Y (blue minus luminance) and R-Y (red minus luminance) describe 
the chrominance of the point on the screen. The digitised versions of these two colour 
difference signals are called Cb and Cr respectively. The plane of Cb,Cr is at right 
angles to the Y axis. and would be flat to the paper if you looked at the bottom of the 
colour cube so that the Y axis went directly into the paper. 


The digitised versions of (B-Y), (R-Y) are known as Cb,Cr partly to reflect the change 
in scaling which is applied. Cb and Cr can be both positive or negative. Roughly 
speaking, Cb describes the degree blueness - or anti-blueness (i.e. yellowness) if it is 
negative. Cr describes the degree of redness - or anti-redness (1.6. cyan-ness) if it is 
negative. x 


Notice that the entire RGB cube is contained within the Y,Cb,Cr cube - hence many of 
the Y,Cb,Cr colours are “illegal” in RGB terms. Some, but not all, of these 
RGB-illegal colours will also make illegal PAL. 


[FIG 3 - Hue and Saturation] 

For completeness, but it is not normally used in digital television, I will mention hue 
and saturation. Hue and saturation are an alternative method of describing 
chrominance - using polar co-ordinates rather than the Cb,Cr Cartesian co-ordinates. 
Hue is the angle of the colour in the chrominance plane, while saturation is the distance 
from the centre. For example, pink is a less saturated version of “Post-office” red, but 
they can both have the same hue (and indeed could have the same brightness as well). 


Notice the similarity between the chrominance plane in FIG - 3 and the display on a 
vectorscope. 


Having said all of that, I will largely ignore the colour or chrominance signals this 
evening and deal with the luminance only. The Cb,Cr signals are handled in much the 
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same way as the luminance signal, except that due account must be taken of the fact 
that they are only sampled at half the rate of the luminance signal. 


DIGITISING RECAP. 

BINARY ARITHMETIC. 

All digital video processing machines, either dedicated machines or standard 
computers, use binary arithmetic when they process digital video. Inevitably I am 
going to have to get a bit mathematical. First I will have to touch on some of the 
basics of binary arithmetic. However, if I lose you, don't worry too much - as far as 
possible I will try not to make the later parts of this talk too dependant on a thorough 
knowledge of the subject. 


[FIG 4 - Example of a binary number] 

The decimal numbers we know and love count in 10s. Each digit can have 10 different 
values, 0..9, Each digit position in the number, working from the right, is worth 10 
times that of the number to it's right. 


Binary numbers work in very much the same way, except that they count in 2s. So for 
“10” above, read “2”. Each digit in a binary number can have 2 different values, 0..1. 
Each digit position in the number, working from the right, is worth 2 times that of the 
number to it's right. Each binary digit is know as a bit. 


From the above it follows that an 8 digit binary number (or 8-bit number) can have a 
maximum value of 255 (decimal). An easy (!) way of working out the maximum value 
of a given size binary number is to remember that it is (2irumber of it?) 1), So the 
maximum value of a 10-bit number is 1023. 


DIGITISING STANDARDS. 
A mention of some of the various Standards relating to digital video: 
CCIR 601: International digital video sampling standard covering 625 and 
525 systems. 


CCIR 656: International digital video interface standard covering 625 and 
525 systems. Includes both parallel and serial interfaces. 


SDI: Serial Digital Interface. Another term for the serial version of 
CCIR 656. 
4:2:2: Actually the sampling frequency ratios of Y, Cb, Cr sampled 


according to the most common option in CCIR 601. 
“4” = 13.5MHz. “2” = 6.75MHz. 
Frequently used coloquially in place of CCIR 601 or 656. 


DIGITAL VIDEO SAMPLING. 

In a previous lecture it was explained that the international CCIR 601 video sampling 

“ standard divides the active period of each line (about 52us of the total 64us) into 720 
samples or pixels. The brightness (luminance) of each sample is described by an 8 or 
10 bit binary number, which means that the brightness can be described as being at one 
of 256 (0..255) or 1024 (0..1023) levels respectively. 
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[FIG 5 - CCIR 601 luminance sampling 

Taking the case of 8 bit luminance samples, the CCIR 601 standard also defines that 
level 16 (out of the total range of 0 to 255 levels) is black and level 235 is white. This 
is equivalent to the range OmV to 700mV of the standard analogue luminance or 
analogue PAL signal. Levels below 16 (15 down to 1) accommodate analogue 
*undershoot" while levels above 235 (236 to 254) accommodate analogue 
“overshoot”. Levels 0 and 255 are prohibited since they are reserved for use in the 
CCIR 656 digital interface timing synchronising words ("TRSs"). In practice levels 0 
and 255 are usually allowed within a machine and removed at the final stage before the 
digital video is output on the serial CCIR 601 cable. 


[FIG 6 - CCIR 601 chrominance sampling] 

The two colour difference (chrominance) signals, Cb and Cr may also be digitised to 
either 8 or 10 bits, although they will always be sampled to the same number of bits as 
the luminance. However they are each sampled only 360 times in each active line 
period - half as frequently as the luminance signal. For 8 bit sampling, CCIR 601 
defines that their nominal full range values are 16 to 240 (peak negative to peak 
positive) with level 128 being equivalent to zero chrominance. This swing matches the 
standard analogue swing of +/-350mV, with the levels below 16 and above 240 
accomodating overshoot. Again, levels 0 and 255 are prohibited. 


Because level 128 is equivalent to zero chrominance the digitised chrominance values 
are said to be in offset-binary or offset-128 rather than straight binary. By the same 
logic, the digitised luminance values should be regarded as offset-/6 - a fact which I 
suspect is frequently forgotten. 


As I said, I will largely ignore the chrominance signals and concentrate on the 
luminance. Further, I will ignore 10 bit digitising and concentrate on 8 bit digitising. 


BRIGHTNESS ADJUSTMENT USING AN ADDER 

The digital equivalent of the television “brightness” control is the adder - which is 
exactly what the brightness control actually does; it adds an offset to the video signal 
and so makes dark colours and light colours brighter by an equal amount. 


[FIG 7 - Brightness Adder] 

The digital adder is the simplest of the digital video processing elements. It adds a 
given digital offset value to every pixel’s luminance level. The offset will generally be 
provided by the controlling CPU (central processing unit or computer) in a dedicated 
machine while the whole thing will be a software operation in a computer. 


[FIG 8 - Example of binary addition] 
An example. If the offset is 35 (i.e. 35 levels) the adder will add 35 to each digital 


video value. Video level 20 will become level 55 and video level 230 will become level 


265. Sounds simple, but if this is an 8 bit signal then level 265 presents a problem. An 
8 bit number can only represent numbers between 0 and 255, so 265 actually needs a 9 
bit number to represent it. So what do we do with 265? We limit it to 255. This is 
exactly equivalent to an analogue limiter. It is a simple piece of digital logic which 
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detects an output from the adder greater than 255 and substitutes the value 255 in its 
place. 


CONTRAST USING A MULTIPLIER 

[FIG 9 - Contrast Multiplier] 

The digital equivalent of the television “contrast” control is the multiplier. Again this 
is exactly equivalent to the analogue contrast (or gain) control: it makes the digital 
video value larger (or smaller). In a dedicated machine an 8x8 multiplier might be used 
for this job. An 8x8 multiplier is a digital element (normally a single integrated circuit) 
which multiplies two 8 bit numbers together. Since each of these numbers has a range 
of 0 to 255, the output from the multiplier can lie anywhere in the range 0 (0x0) to 
65025 (255x255) and is a 16 bit number. 65025 needs a 16 bit number because 16 bits 
are required to represent a number which is that large (in fact anywhere from 0 to 
65535). Clearly 65025 is a lot bigger than can be represented by an 8 bit video value 
and presents a problem. But actually there is an even bigger problem, which is that the 
multiplier I have described can only provide gain values 0, 1,2... 255. So what ifl 
want a gain of just a little more or a little less than one (“unity”)? 


[FIG 10 - Binary Fractions] 

The solution here is largely a conceptual one, but one which recurs throughout digital 
processing operations. Regard the gain factor as having values, not of 0 to 255 but of 
(0 to 255) times some fraction. Ifthe maximum gain (contrast increase) we will ever 
require is a factor of two, then the gain value only needs to go as large as 2. To 
achieve we will think of it as having a range of (0 to 255) x 1/128. Le. 0/128 to 
255/128. 255/128 is 1 + 127/128 and is almost 2. In the example, gain value 187 
(decimal) is regarded as being 187/128 = 1+59/128 = 1.46 (approximately). 


[FIG 11 - Binary Fraction Multiplication] 

Using this concept, unity gain is 128/128 (i.e. one) and we can easily have a gain value 
just a little less than one - e.g. 127/128, or a little more - e.g. 129/128. This means that 
the multiplier output is now also regarded as being divided by 128 - i.e. the maximum 
output is 65025/128 (= 508 + 1/128). 


Why 1/128ths and not some other fraction? Well in this example it happens to suit us 
to use 1/128, but in any case the denominator must always be a power of 2 
(e.g. 128 = 27) because we are dealing with binary arithmetic. 


[Refer to FIG 11 again] 

We still have two problems with our multiplier: 

1. The largest output is still larger than the largest possible 8 bit video value of 255, so 
we need a limiter again. 

2. The output contains a fraction which we need to get rid of. In this case the lower 7 
bits of the 16 bit multiplier output are “fractional” bits and must be removed. 
Unfortunately just ignoring the fractional bits is not the right answer since this 
would be truncation and is unnecessarily inaccurate. A truncated answer is always 
less than the full answer, by as much as almost one whole step level - e.g. 

35 + 127/128 will be truncated to 35 even though it is almost 36. A much better 
answer is to round the result to get rid of the fractional bits. Rounding sets the final 
answer to the nearest step level, in the above example to 36. A rounded answer is 
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on average correct, while a truncated answer is on average one half step level 
lower. This is easily demonstrated by the fact that one way of rounding is to add 
one half and then truncate. Try it! 


So what does the digital logic of this rounder consist of? It consists of exactly what I 
have described - an adder which adds a fixed value of one half (actually 64/128) to the 
16 bit multiplier output and then truncates the lower 7 (fractional) bits by simply 
ignoring them. 


The error caused by truncation sound pretty subtle, so does it matter? Yes it does. If 
several operations requiring rounding are carried out in sequence and truncation is 
used instead, the average value of the image will be reduced by one half step level each 
time and the image will grow steadily darker. 


If the operation involves the Cb,Cr chrominance signals the effect is even worse. In 
this case the reduction is not towards zero but towards the most negative level - which 
is green. Repeated use of truncation rather than rounding will introduce a green tinge 
to an image - an effect you may have seen for yourselves. 


Unfortunately even rounding is often not good enough. Ifthe image is a synthetic 
image with gently graduated noise-free colouring then the fact that the image is only 
digitised to 8 or 10 bits (256 or 1024 step levels) may show up as “contouring” where 
the individual steps in brightness can be seen. This is particularly noticeable in 
graduations between mid and dark blue. The eye is exceptionally good at picking out 
these contour edges. Can anything be done about it? Yes. Dynamic Rounding. 


DYNAMIC ROUNDING 

Dynamic rounding is one of several methods of alleviating the problem, and is patented 
and trade-marked by Quantel. Dynamic rounding works by rounding upwards with 
the probability of the fractional part. In other words if the fraction is 3/4 then there 
will be a 75% probability that the final answer will be rounded up to the next larger 
value (and a 25% probability that it won’t). With conventional (static) rounding a 
fraction of 3/4 will always round up and produces exactly the same rounded answer as, 
say, 7/8. On average, Dynamic Rounding gives the correct answer to its full fractional 
accuracy. But more importantly, its effect is to de-correlate the contour edges 
mentioned above and so stops the eye spotting them. The licence is available to one 
and all for a fee of one dollar! There are some conditions, though. 


BLACK OFFSET 

It has already been mentioned that the CCIR 601 sampling standard defines digital 
video level 16 as being black, rather than the more obvious level 0. In operations like 
the contrast-changing multiplier described above this fact should be taken into account. 
Clearly black should stay black when the contrast is changed, but in the circuit I have 
just described an input video level of 16 would be changed by the gain factor. e.g. a 
gain of 1.46 would change 16 to 23. This error should be compensated for, though I 
fancy it often isn’t! 
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A RECAP ON DIGITAL SCALING 

Any machine handling digital video, be it a real-dedicated hardware machine ora 
standard computer, will attempt to handle digital video levels as integer numbers. 
Integers are whole numbers, but not necessarily whole numbers of ones. Normally 
these will be the 8 or 10 bit integers which we have talked about so far although much 
higher numbers of bits will be maintained within processes. The alternative is floating 
point numbers. Floating point numbers are capable of much greater range and 
accuracy than integers, which sounds like a good thing. The trouble is that they take a 
lot more space to store, a lot more logic to handle and, іп a general purpose computer, 
will be a lot slower to process. Integers are a good thing! 


[FIG 12 - More binary fractions] 

However, it is often useful to think of digital video as being in the range 0 to 1 rather 
than 0 to 255. The key word here is think. All that is necessary is to think that the 
8-bit video value is (0..255) 1/256ths rather than (0..255) ones. In other words the 
binary point is considered to be on the left rather than on the right. The integer is still 
an integer, just that it now represents a whole number of fractions. It is frequently just 
a concept but it may reflect itself into the hardware - at which point you need a clear 
head! 


DIGITAL MIXERS / KEYERS 


The traditional vision mixer has a fader arm which lets you mix (or dissolve or fade 
between two video sources. Imagine that two video sources, “A” and ”B”, have been 
selected into the mixer. At one end of the fader arm travel all of video source “A” 18 
selected, and at the other end all of source “B” is selected. In between a fraction of 
source A is selected and added to (1-fraction) of source B. “Fraction” is dependant on 
the fader arm position. 


It is interesting to note that if the same video source is connected to both inputs of the 
mixer then the output shouldn't change as the fader arm is moved. 


THE MATHEMATICS OF THE VISION MIXER 
So what are the mathematics of the operation? 


OUTPUT = (K x A) + (1 - K) xB) 


Where K is the fractional position of the fader arm. In a digital machine the fader arm 
position has to be digitised, usually to 8 or more bits. 


THE LOGIC OF THE DIGITAL VISION MIXER 

[FIG 13 - Digital Mixer/Keyer] 

The one problem which the digital vision mixer does но! have is over-size numbers 
which need limiting. The output video levels must lie in between the two input video 
levels and therefore cannot be out of range of the 8 (or 10) bit input numbers. The test 
mentioned above where the output must not change if the two inputs are identical and 
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the fader arm is moved gives us a clue to one requirement of the Mixer logic. The two 
coefficients by which the two inputs are multiplied MUST add up to “unity”. 
(DEFINITION: Coefficient - a gain factor or weight) 


ie (K) + (1-К) =1 


This is a frequent requirement of a number of digital video processes - mixers, 
interpolators and filters. All have two or more coefficients and one test of a good 
design is that if a “flat field" video test signal is processed is will come out unchanged. 


Full accuracy must be maintained throughout the arithmetic and the fractional bits 
should be rounded off rather than truncated - avoiding the dreaded green tinge. 


The final problem is in “K”. Firstly, (K) and (1-К) are in the range 0..1. OK, I 
covered that earlier - they can still be 8-bit integers, they are just thought of as 
fractions. In this case 1/256ths. But K also needs to be exactly unity when the fader 
arm hits the far end stop, while (1-K) needs to be exactly unity when the fader arm hits 
the near end stop. If this is not the case the Mixer will not be transparent when the 
fader arm is on the stops. The problem is that if we are dealing іп 1/256ths, then the 
largest 8-bit number we can have is 255/256 - which is a little less than one - and we 
need 256/256. The usual solution is to recognise that a value of K of 255 is the end 
stop and to find some way of selecting all of source “B”. This emulates 256/256 - i.e. 
unity. An obvious alternative is to use a bigger multiplier. 


THE ALTERNATIVE MIXER LOGIC 
[FIG 14 - Alternative Mixer/Keyer logic] 
A simple bit of algebra converts 


OUTPUT + (K x A) + (И - K) x B) 
into 


OUTPUT =K x (A - B) +B 


A digital vision mixer implemented using the second equation has one less multiplier in 
it - and hardware multipliers are expensive while software multipliers are slow. 
However, that one multiplier needs to have a bipolar (positive and negative) 9-bit input 
because (A-B) can be anywhere between -255 and +255 - a range of 510 step levels. 
Another design decision to be made. 


KEYERS 

A linear Keyer uses exactly the same logic as the vision mixer. The only difference is 
that instead of driving “K” with a value derived from a vision mixer fader arm it is 
driven by a “key” or “alpha-channel” video signal. 


The object of a Keyer is to select “foreground” video where the “key” is white and to 
select “background” video where the key is black. If the “A” input is the foreground 
video and the “B” input is the background video then the mixer logic will achieve 
exactly that. Actually it is better than that, because where the key video is in between 
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black and white (i.e. grey) the Keyer output will be a mix of the two inputs - a linear 
key. This is exactly what is required if the final result is to look natural. The 
alternative is a “hard” key which simply switches between the inputs and produces 
jagged edges. 


There is a hidden problem here, due to the fact that the key video will probably be 
digitised to CCIR 601 - with level 16 = black and level 235 = white. Unless something 
is done about it, black or white key levels won’t put the keyer on its end stops - and 
the unwanted video source will “bleed” through. A case for re-scaling the key. 


Frequently the key signal will be derived from the foreground - а “self-key”. This is 
most frequently used with blue-screen shots where a person stands in front of a blue 
backdrop or “screen”. 


I do not propose to go into the detail of how a blue-screen self key is created this 
evening. I will simply point out that blue-screen blue will occupy a certain “volume” 
within the total volume of the RGB Colour Cube I talked about earlier. “All” that is 
required is to detect if the colour of a pixel falls within that blue-screen “volume” and 
to create a black key when it does. The Keyer will then use that black key to select an 
alternative background. In the non-blue foreground area the generated key will be 
white, so in this area the Keyer will select the foreground itself. 


SPATIAL MANIPULATION - “FLYING PICTURES” 


Now we'll move on to the interesting part. 


THE DIGITISED TELEVISION FRAME. 

Before I begin to describe how a frame of television video can be manipulated spatially 
I will just recap on the basics of digitising. Remember that the picture on the television 
screen is “painted” by a dot which scans 625 horizontal lines from left to right across 
the screen while moving twice from top to bottom. Each line lasts 64us, of which 
about 52us are visible (active) while the remaining 12us are spent re-scanning from 
right to left (horizontal blanking). The active line is sampled 720 times, and hence 
there are 720 “pixels” per line. 


Each of the two vertical scans which make up a complete picture or “frame” is called a 
“field” - Field 1 and Field 2. The lines painted during the two fields are interleaved 
(“interlaced”) on the screen to form the complete picture. 


Each analogue field has half of 625 lines - 312 1/2 lines. Of these, 287 1/2 are visible 
(visible) and the other 25 are vertical blanking lines during which the dot re-scans from 
the bottom of the screen back to the top. Hence there are a total of 575 visible lines in 
a complete television frame (or picture or image - call it what you will). 


CCIR 601 calls for 576 active lines - in other words it includes the whole of the half 
lines in the analogue specification. But of course there is no picture on the extra halves 
of these lines. Indeed even the “good” halves of these lines are just a nuisance during 
image manipulation and most DVEs crop them off when they move the picture. 
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In the digital domain the two fields are whole numbers of lines long, and therefore have 
to be of different lengths. Field 1 is 312 lines long, while Field 2 is 313 lines long. 


[FIG 15 - Interlaced sloping lines on a TV screen with the half lines filling the 


corners] | 
Of course the half lines are there to “fill in the corners” of the screen because the lines 
are sloping - - -. This may have been significant for Baird who didn’t have many lines, 


but seems to me to be irrelevant to modern television when the domestic TV is always 
overscanned anyway. As far as I know, all DVEs ignore the fact that the lines on the 
TV screen are sloping and treat them as if they were horizontal. 


So. The television image can be considered to be a grid of luminance pixels, 720 wide 
by 576 high (574 if the half lines are cropped off). The Cb,Cr chrominance pixels are 
similar except that there are only 360 of each per line, co-sited (i.e. sampled at the 
same time as each other). 


If the video source is from a video camera, then the two interlaced fields which make 
up each frame must normally be assumed to be different (imagine the camera panning). 
If the video source is a telecine (i.e. film), then the two fields will represent the same 
image, albeit with a vertical offset. This distinction is important since treating the two 
fields as a single frame (from film) allows higher resolution to be maintained as the 
image is manipulated. However, if the source is from a video camera then the two 
fields must normally be handled separately or the result will be very unpleasant! 


THE FRAME STORE. 

[FIG 16 - A frame store as a grid of cells] 

To perform any significant spatial manipulation on a frame of television video ; ou need 
a frame store. A frame store, as its name implies, can store a frame of television video 
(for the moment I will ignore field interlace). A frame store is a grid of "cells" into 
which you can put (write) or from which you can get (read) video pixel values. There 
will be 720 “cells” to a line, and 576 lines of “cells”, one cell for every pixel in a 
television image. Frame stores used for spatial manipulation need agile addressing so 
that you can get at the pixel values when you want them. 


[FIG 17 - A two port store - with two addresses and separate video input and 
video output] 

Framestores are normally treated as 2-port stores and have two addresses - the write 
address which determines where the current input video pixel value will be written and 
the read address which determines where the current output video pixel value will be 
read from. The frame store may be anything from a chunk of memory in a PC assigned 
as a software array to an explicit high speed static RAM in a dedicated effects machine. 


BENT WRITE ADDRESSES VS. BENT READ ADDRESSES 

[FIG 18 - Read-side vs. Write-side addressing] 

Picture manipulation (positioning (scrolling), sizing, rotating, perspective rotating and 
even bending) can be done in one of two basic ways - write-side manipulation and 
read-side manipulation. 
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WRITE-SIDE MANIPULATION - Write it where you want it. 

[FIG 19 - “Bent” write addresses and raster-scanning read addresses] 

Write side processing is so-called because the picture manipulation is done as the video 

is written into the frame store. As each (raster scanned) input pixel arrives, the write 

side address generator calculates where that pixel will appear on the output TV screen 

and writes the pixel data into the matching “cell” in the frame store. After the last > 
input pixel has been written the frame store contains the complete manipulated image | 
as it will be seen on the TV screen; it can then be read.out by a simple raster-scanning | 
read address and displayed directly. 


READ-SIDE MANIPULATION - Read it when you need it. 

[FIG 20 - Raster-scanning write addresses and “bent” read addresses] 

In read side processing the situation is reversed. The input video data is written 
directly into the frame store by a simple raster-scanning write address and so the frame 
store contains the source image “as-is”. The core of the read address generator tracks 
the dot on the output TV screen, and, for each position on that screen, calculates 
which point on the source image (which is held in the frame store) to put there. That 
pixel value is then read from the frame store and can be displayed directly. 


ADDRESS GENERATION. 

I will ignore curvilinear (bent picture) effects and stick to simple flat-image 
manipulations. As it happens, the equations calculated by the address generators for 
scrolling, sizing, z-rotating and full 3D perspective rotation have exactly the same 
forms in both read-side and write-side machines. 


Simple positioning and sizing. 
[FIG 21 - Address generator with pixel & line count inputs, a..j inputs and H, V ) 
outputs] i 
First I will deal with simple positioning (scrolling) and sizing. This is where it gets a 

little mathematical. 


Let: 
H = calculated frame store horizontal address. 
V = calculated frame store vertical address. 
xy = pixel count and line count from which the store addresses are 
calculated. 


а.) parameters controlling scroll, size and (eventually) rotation. 
Simple position and size: 

H=ax+c - “a” is the horizontal scale factor and “c” is the amount 
of horizontal scroll, i.e. the horizontal address is a scaled 
pixel count (x) plus a constant. 

V=ey+f - “e” is the vertical scale factor and “f” is the amount of 
vertical scroll. Notice that the vertical address is 
constant along a line. 


“a” changes the horizontal size, while “e” changes the vertical size. They will normally 

be ganged together so that the image maintains its aspect ratio. “с” and "f" control the = 
horizontal and vertical image position. Notice that the vertical address only changes d 
once per line, which cculd allow savings in a hardware implementation. 


Ne 
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This is an appropriate time to point out the “back-to-frontness” of read side machines. 
In a write-side machine you write the picture where you want it. A larger value of “a” 
will cause the input pixels to be written into the frame store further apart, hence 
spreading the pixels and making the resultant image larger. The expected result. 
However, in a read side machine a larger value of “a” means that points on the source 
image which are further apart are read and displayed a fixed distance apart on the - 
screen - so the image on the screen gets smaller. I will come back to this later. 

[FIG 22 - The effect of “a”=2 for both write-side and read-side machines] 


Positioning, sizing and z-rotation. 
Mathematically, this is similar to simple positioning and sizing, but in addition we need 


CC. „22 


“cross terms” which add some “y” into the horizontal store address and some “x” into 
the vertical address. 


Position, size and z-rotate: 


Н = ах + бу + с 
У = dx + ey + f 


As an example, for pure z-rotate this reduces to: 
Н = (sinO)x + (cos9)y 
V = (-cos0)x + (sinO)y 
where 9 is the rotate angle. 


[FIG 23 - An upwards-sloping read address track across the frame store with a 
scan line of a counter-rotated output monitor scan line superimposed on it] 
The diagram shows the calculated read addresses in a read-side machine for a small 
amount of clockwise rotation. Notice that the read addresses have rotated anti- 
clockwise: this is necessary in a read-side machine because the video read from the . 


frame store by this upwards-sloping address track will be displayed horizontally on the 


TV monitor - resulting in a displayed image which has rotated clockwise. As I said, 
read-side machines are back to front! 


Hardware address generators. 
Modern effects machines usually implement the above arithmetic in incrementers rather 
than using multipliers and adders. | 
[FIG 24 - An Incrementer] 
An incrementer is a piece of hardware or software logic which adds a value to an 
accumulator once per pixel clock or once per line, or both. 
Consider the frame store horizontal read addresses relating to two adjacent pixels x: 
and x on the same line on the output monitor of a read-side machine. The calculated 
read side horizontal addresses, H, and Hj, for these output points will be: 

H, = ах! + Бу: + с 

Н, = ах; + Бу: + с 

Hence by simple algebra: (Но - Hi) = a(x2 - xi) + b(y2 - yi) 
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But since the two values for H relate to adjacent pixels on the same line then: 
(x2 - Xi) = 1 and 
(y2- yu) = O 


Hence: H = Н, + а 
ie the value which must be added to the current horizontal address (Hi) to generate 
the next horizontal address (Но) is simply “а”. 
The complication is that “a” must be specified with enough accuracy such that the 
accumulated error in the generated address after “a” has been added to the 
accumulator 720 times along a line is still acceptable. The initial value of the 
incrementer at the start of each line must also be calculated (by the controlling 
computer). Dedicated LSI chips are available for this method of address generation. 


Positioning, sizing and three-dimensional rotation with perspective. 
This is where things get really complicated and interesting. 


Н = ах + Бу + с 
gx + hy + j 


V = dx + ey + f 
gx + hy +) 


I don’t intend to prove that these equations can produce a perspectived 3D rotation or 
the interesting fact that exactly the same equations, but with different parameters, work 
for either read-side or write-side addressing machines. I will simply point out that the 

‚ equations require a (high accuracy) division - and that it has to be done 13.5 million 
times a second in a real-time broadcast television effects machine! 


THE CHOICE - READ SIDE OR WRITE SIDE? 

As described, both read-side and write-side addressing machines sound as though they 
will work equally well. So what decides the choice? The fact is that the machines 
described so far would produce some pretty awful pictures! They each suffer from 
both “nearest-neighbour” pixel selection and “aliassing” which I will explain in a 
moment. In addition the write side machine cannot expand a picture without 
producing holes in it - as was, actually, shown in the diagram above where the picture 
was expanded by a factor of 2. 


Holey expansion in a write side machine. 

[FIG 25 - Write side with “a” = 2 and missed locations.] 

Let me dispose of this one first. As the picture is expanded in a write side machine the 
incoming pixels have to be written further and further apart in the frame store. 
Inevitably some locations in the store will not be written at all - and appear as holes in 
the image when it is read out of the frame store and displayed. In fact this also 
highlights another problem with write-side machines - what is actually read from the 
unwritten store locations? If уои are not careful it will be the remains of the previous 
image stored in the store. 
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Nearest Neighbour pixel selection. 

I have made no mention of the absolute accuracy required of the address generators | 
described above. Clearly they can be designed to have any required accuracy and need | 
not be constrained to produce only whole number addresses. In a typical effects | 
machine the addresses will have between 4 and 8 fractional address bits, 4 fractional | 
bits means that the calculated store addresses are expressed to the nearest 1/16th of a | 
pixel. 


[FIG 26 - A few pixels in a matrix with a calculated read address lying between | 
them] | 
But frame stores only have whole number pixel addresses, so what do you do with the | 
fractions? You interpolate the image data, that’s what! In a read-side machine this | 
means that you invent output pixel values for points which Пе between the actual | 
stored pixels. 

And in a write-side machine...... 


In the write-side machine described earlier, each incoming pixel was written into a 
computed location in the frame store in order to produce the required image 
manipulation. But it was only written into one location. Interpolating in a write-side 
machine gets complicated. 


So why should anyone want to build a write-side machine? The only advantage, and it 
is a significant advantage, comes when you want to make curved shapes. In computer 
jargon, when you want to texture map live video over 3D shapes. As we saw above, 
read-side machine:addressing works back to front; no big problem for flat, albeit 3D 
rotated and perspectived, pictures. But calculating the “read-side transformed” read- 


· side store addresses for a curved shape really does get difficult and I don’t intend to 


tackle it tonight. Suffice it to say that when we designed the Quantel Mirage twelve 
years ago we ducked that problem and built a write-side machine. Along the way we 
solved the interpolation problem and the “aliassing” problem - and the problem of how 
to build a random access store out of the slow DRAMs of the day - but we didn’t 
entirely solve the “holey expansion” problem. And I don’t know anyone else who has. 
So lets ignore write-side addressing machines and concentrate on read-side machines. 


INTERPOLATORS 


Interpolator arithmetic. | 

An interpolator is the logic - software in a computer or hardware in a dedicated effects 
machine - which invents output pixels which lie in between real pixels when the 
address generator comes up with a fractional address. Nearest-neighbour pixel 
selection avoids the need for an interpolator by rounding the computed addresses to 
the nearest whole pixel, and then selecting that one pixel. It is an obvious speed-up 
trick in a pure software (i.e. computer) machine, but it produces decidedly poor quality 
pictures. 
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LINEAR INTERPOLATION. 

Imagine that a 3.5MHz sine wave is digitised at 13.SMHz (the industry standard 
sampling frequency for Broadcast products). If we wish to invent in-between pixel 
values using only the actual pixel values on either side then the best we can do is to 
draw a straight line between two sample values and move the appropriate fraction of 
the way along the line. This is know as linear interpolation. 


[FIG 27 - Example of single axis interpolation] 
In the first example the pixel we are trying to invent has no fractional vertical address 
component but it has a fractional horizontal address of 3/16ths (0011 binary). 


[FIG 28 - Digitised values from a 3.5MHz sine wave (excluding the waveform)] 
The five samples in the Diagram are actually samples of a 3.5MHz sine wave. For 
linear interpolation we draw a straight line between the values of the samples to the left 
and right of the pixel we are trying to invent, and then move 3/16ths of the way along 
that line to find the interpolated value. 


Mathematically, the invented pixel is made up of 3/16 of the pixel to the right of it and 
(1-3/16) = 13/16 of the rather closer pixel to the left of it. In general, considering only 
fractional addresses in the horizontal axis: 
Let Н; = fractional part of computed horizontal read address. 
P, = value of pixel to the left of the required point. 
Рр = value of pixel to the right of the required point. 
P; = required value of the interpolated pixel. 


Then: Р|- (1 - НӘРІ, + (НЭРь 


[FIG 29 - Example of Bi-linear (2D) interpolation] 
This is easily extended to include fractional addresses in both axes: 
Also let V, = fractional part of computed vertical read address. 
Р, т = value of pixel to the left and above the required point. 
Par = value of pixel to the right and above the required point. 
Р, в = value of pixel to the left and below the required point. 
Pre = value of pixel to the right and below the required point. 


Then: P = (1 - НӘСІ - Ve)Prr + (H9(1 Е \У Рат 
+ (1 - НЭСУдРьв + (Н)(М)Рав 


This is known as bi-linear interpolation ог “4-point” interpolation. 
Digital video processing building blocks. 


This all looks fairly complicated stuff, but it uses the adders and multipliers which I 
covered earlier and introduces an additional building block, the Look-Up Table (LUT). 
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First of all, let’s express the above equation in terms of the coefficients (or weights) 
applied to the four “corner” pixels. There are four coefficients, one for each corner: 
Сут, Свт, Сів and Свв where: 


Ст = (1 - НОСІ - Мо left top corner coefficient 

Свт = (НЈИ - Уу right top corner coefficient 
Crp = (1 - He) (V9) left bottom corner coefficient 
Crp =(Н)(Уд right bottom corner coefficient 


with each coefficient having a value between 0.0 and 1.0. 


Hence: P = (Сіл)Ріт + (Скт)Ркт 
+ (Св)Рв + (Скв)Ркв 


Notice that the sum of the four coefficients must add up to 1.0. If they don't, then the 
interpolated image will get brighter or darker - easily tested with a flat-field test image. 


INTERPOLATOR LOGIC. 

[FIG 30 - Bi-linear interpolator logic] 

Look back at the bit I skipped - how to generate the four corner coefficients for the 
interpolator. The calculations here seem quite complex, and if one was to do them 
directly in software they would be quite slow. For example the top left coefficient is: 


Ст = (1 - Но - У) 


which involves two subtractions and a multiply. The thing to notice here is that it is a 
“function” of two variables - the horizontal and vertical address fractions. If these 
fractions are, say, 6 bits each, then what we need is a “black box” with 12 input lines 
(two sets of 6 bits) and as many output lines as we need for accuracy. The black box 
could be full of multipliers and subtractors, but a much simpler hardware solution is to 
use a memory chip as a Look-Up Table LUT). 


A LUT is a pre-computed table of values which allow the required answer to be 
“looked-up” rather than calculated every time it is needed. Typically a PROM ina 
dedicated hardware machine: in this case a 4K word PROM. They cost just a pound 
or two and are very small - what else could you want? Ina standard computer a LUT 
would be a software array. 


Interpolator accuracy. 

[FIG 31 - Digitised values from a 3.5MHz sine wave (including the waveform)] 
Clearly the linear interpolator will in general invent a better in-between pixel value that 
the nearest-neighbour method of simply selecting the nearest real pixel value. It is also 
clear from this diagram that it could be better. But is it worth it? That depends on the 
application. 


[FIG 32 - 3.5MHz sine wave with symmetrical samples] 

Take the same 3.5MHz sine wave we used earlier. Imagine that the 13.5MHZ samples 
happen to lie symmetrically on either side of the positive peak of the sine wave. 
Imagine also that we want to scroll (re-position) the whole image by one half pixel. 
What the interpolator has to do is to invent output pixels which lie mid way between 
the input sample points. Clearly the best it can do for the (invented) positive peak is to 
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calculate a value which is the same as the sample values either side of the peak - a 
value which is only about 65% of the actual peak value. This attenuation applies for 
any phase of 3.5MHz sine wave with a half pixel scroll, although I do not intend to 
prove it. 


[FIG 32 (again) - 1.5MHz, 3.5MHz and 4.2M Hz sine waves] 

If we repeat the interpolation with other frequencies of sine waves we can see that the 
invented peak value gets more accurate as the frequency gets lower (perfectly accurate 
at DC!) and worse as the frequency gets higher. 


[FIG 33 - Linear (2-point) interpolator frequency response] 

The diagram shows the result of calculating the attenuation of the linear interpolator 
for a half pixel scroll for a range of frequencies. The gain at 5.5MHz - the bandwidth 
limit for 625 line Broadcast Television - is only about 30%. As a matter of interest, 
notice also that the gain is zero at 6.75MHz - the half-sampling frequency. Nyquist 
had something going for him after all! 


Four-point (quadratic) interpolator. 

What can we do to improve things? Well, that man Nyquist says that it is possible to 
reconstruct every point on the digitised waveform. But only if we use every sample on 
either side of the point we are trying to invent. Clearly impractical. But it does give us 
a clue to a better approach. If we take two samples on either side of the sample we are 
trying to invent we can obviously draw a curve through them which will be a better fit 
to the original waveform than just joining up the points with straight lines. Such a 
single axis 4-point interpolator is known as a quadratic interpolator. 


[FIG 34 - Quadratic interpolator frequency response] 

One of the problems with the quadratic interpolator is that it gives us more choices. 
The total of the four weights or coefficients still has to add up to unity (or the 
interpolated image will get brighter or darker), but we can vary the ratio of the inner 
and outer coefficients. The diagram shows a whole family of frequency response 
curves. We can boost the high frequency response as much as we like, but only by 
incurring the penalty of having excessive mid-frequency gain. 


[FIG 35 -Various interpolator frequency responses] 

More points ought to make things better, and indeed they do. This diagram shows the 
frequency responses for 2, 4, 6 and 8 point interpolators. Juggling the relative values 
of the various coefficients would probably generate even better curves. So why not 
use more points? Cost. 


A bi-linear (2x2) interpolator requires the four surrounding pixels to be read from the 
frame store for every output pixel. This significantly complicates the store structure 
compared to nearest-neighbour pixel value selection. 


As we increase the number of points which we interpolate across in each axis we need 
squared that number to do it in two dimensions. A 4x4 (bi-quadratic) interpolator 
requires us to design a store which will access 16 pixels at one time to create each 
output pixel, while an 8x8 interpolator would need 64 pixels at a time. All at 13.5 
million times a second in a real-time effects machine. Daunting. 
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An obvious cause for compromise. Quality can be traded against speed or cost. Ina 
paint system, which only has to manipulate a still image, once, a second or two might 
be a price worth paying. But in a machine intended to manipulate moving video, 
probably not. 


FILTERS. 

ALIASSING. | | 

[FIG 36 - Read-side squash-by-3 (“a” = 3)] 

Imagine a read-side machine doing a horizontal-only size reduction by a factor of 3. 
This requires parameter “a” = 3 in the equations given earlier. It will result in a read 
address sequence which hops across the frame store 3 locations at a time. Say 


What happens to the picture information in the store locations which get missed 
(2,3,5,6,8,9,...)? It is obviously just ignored. Imagine further that the squashed image 
is now scrolled slowly across the screen, one pixel at a time. So the next output image 
will be created by a read address sequence of 2,5,8,11,... Clearly image detail will 
come and go as the image is scrolled. This is a/iassing. 


What can be done about aliassing? A glance at what a television camera would 
produce if it was trying to do the same thing is often useful. If we zoom the lens on a 
camera so that the image gets smaller then fine detail will lose amplitude and finally 
disappear - but it shouldn't come and go in the manner described above. What we 
need to do is to include information from more and more pixels in the calculated 
output as the шан gets smaller. 


We can do this in two ways: 

1. Input filtering. Doing some sort of averaging of the image on the way into the 
frame store (blurring it, so that it doesn’t matter which pixel you chose to read, you 
will still pick up some of all of the detail) 

2. Output filtering. Doing some sort of averaging of the image on the way out the 
frame store as it is read. This can usefully be included with the interpolation 
process, but it takes us back to the problem of accessing many store locations for 
each output pixel. 


A clue to what we should do is given by Nyquist. Nyquist said, as you have been told 
in previous lectures, that the analogue signal should be filtered so that there are no 
frequency components above half the sampling frequency. In the case of broadcast 
television with a sampling frequency of 13.5MHz, this means a cut-off of 6.75MHz. 
Difficult but possible when we also want a pass band up to 5.5MHz for broadcast 
video. 


However, in our squash-by-3 example, we are only reading every third pixel sample 
out of the frame store. It is as though we had sampled the image at one third of the 
actual sample rate - 4.5MHz. If we had actually done that, then Nyquist says that we 
should have removed all frequency components above 2.25MHz, which is what we 
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therefore need to do. An image filtered in this way would look blurred or defocused if 
viewed full size. 


A variable frequency analogue filter is just about possible, but does not help if our 
input is already digitised. In any case, if we squash the image vertically then we will 
want to filter it vertically as well. And if we want to do 3D rotations with perspective 
(which causes the horizontal and vertical compression to vary across the image), then 
we want to vary the filtering to match. 


[FIG 37 - Input filter in a typical read-side D VE] 

One answer has to be a digital filter, at the input to the frame store. This filter has to 
be a two dimensional filter (both horizontal and vertical) and it has to be able to change 
its filtering at (or close to) pixel rate - 13.5 million times a second. 


[FIG 38 - A 5-tap FIR Filter] 

An FIR or transversal filter is the favoured type of filter. Such a filter has linear phase 
or constant group delay - the same thing. What this means is that the bluring effect of 
a low pass FIR is symetrical. The alternative filter type, the JIR or recursive filter has 
non-linear phase and such a low pass filter would produce asymmetric bluring. 


The trouble with FIRs is that their effect is limited by their size. The output from the 
5-{ар FIR shown in FIG 38 cannot include data from more than two pixels on either 
side. If we wanted to squash the picture by a factor of, say, 10, then we would need an 
FIR with at least 11 taps. In fact, to get a fast frequency roll-off, you would need a lot 
more taps than that. Clearly it gets impracticable, particularly when the filter has to be 
adaptive, pixel to pixel. For this reason a compromise may be made - and the resultant 
aliassing at small picture sizes will be the consequence. 


OUTPUT FILTERING 

A wide (in two dimensions) combined filter and interpolator on the output of the frame 
store is a possibility. However it will be difficult and expensive to implement at full 
video rate and will probably be reserved for still image manipulation where speed can 
be traded against cost. 


PRACTICAL FRAME STORES | 
The frame store which I described earlier was not very practical. It required a two- 
port store (and genuine 2-port stores are difficult and expensive to build) and it 
ignored the whole problem of trying to write data into a store at the same time as you 
are trying to read it out. Imagine that we have a read-side machine and we have 
inverted the picture so that we are reading the frame store from the bottom up. 
Initially we will be reading values which were written during the previous frame (or 
field), but half way through we will cross the write address coming the other way and 
begin reading values which have only just been written. A phenomenon know as 
cross-over which causes pictures to split in the middle. 
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THE FLIP-FLOP FRAME STORE 

[FIG 39 - The Flip-Flop Store] 

One solution is the flip-flop frame store. The frame store actually consists of two 
single-port stores, one being written while the other is being read. The stores can be 
either field stores (which only store one field) or frame stores (which store a whole 
frame - two fields). The problem with such a store is that single fields must be read 

if the moving video source is a video camera, although higher vertical resolution can be 
achieved by reading a whole frame at a time but only if the video source is from film (a 
telecine). The two versions are known as field machines and frame machines 
respectively. 


REDUCED VERTICAL RESOLUTION IN FIELD MACHINES 

If a swapping field store design is used, then only the video data from a single field will 
be available to the interpolator. It will have to invent vertical in-between pixels by 
interpolating between adjacent field lines - which are not adjacent lines in the image 
(“on the glass”). 


MOVEMENT DETECTION 

Imagine a scene shot with a locked-off video camera with people moving in the 
foreground. Clearly the foreground image changes every field and only pixels from a 
single field must be processed together. However the background is unchanging and 
would be handled with higher vertical resolution if all the lines in the whole frame were 
processed together. What we want is a machine which is a field machine and a frame 
machine at the same time. Think about it. 


THE TWO-PASS ALGORITHM 

[FIG 40 - 2-Pass Z-axis Rotate] 

I am not an expert on other manufacturer's DVEs. However I should mention an 
alternative store structure which was used by the very first full 3D perspective effects 
machine - the Ampex ADO. I understand that this machine used the “two pass 
algorithm”. This method avoids the need to move data between lines in anything other 
than a very structured way, and works by first shearing the image horizontally and 
then shearing it vertically. The vertical shear is actually achieved by first rotating the 
image by a fixed 90 degrees and then shearing it horizontally. A final 90 degree fixed 
rotate may or may not be needed depending on the final orientation of the image. 


The diagram shows the technique used for a simple z-axis rotation. However it 15 
possible to perform full perspective 3D rotations using the method (so I am told!). 
The benefits stem from it being a single-axis-at-a-time method. This means that large 
read-side interpolator/filters are easier to build, and a less agile store is needed. This 
latter feature was particularly important in the early 80s when DRAMs were the order 
of the day and static RAMs were too small and too expensive. 


oo 
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COLOUR SPACE 


THE RED / GREEN / BLUE COLOUR CUBE 


y 
dh 
YELLOW LUMINANCE ) 
| | 
| 
i 
RED се, 
| i 
CYAN 
BLACK BLUE 
The points of the cube are the colour-bar colours, and contain either 0% or 100% of 
each of red, green and blue. 
Any point on the LUMINANCE axis has equal amounts of red, green and blue and 
therefore has no colour - i.e. it is grey. =, 
FIG | 


BOTTOM VIEW OF THE RGB COLOUR CUBE 


MAGENTA 


LUMINANCE 
AXIS 


The LUMINANCE axis points into the page 
FIG - 2 
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COLOUR SPACE 


BOTTOM VIEW OF THE RGB COLOUR CUBE 


HUE AND SATURATION 
RED +Cr 
HE 7 
\ 
YELLOW 2 SATURATION \ 
-Cb +Cb 
BLUE 
Ж 
23 
я 
ҒА 
и 
GREE РА 
2 
-Cr CYAN 


SATURATION is the distance from the centre (the origin). 


HUE is the angle of rotation. 


FIG - 3 


Copyright QUANTEL 1994 


EXAMPLE OF A BINARY NUMBER 


Decimal 8-bit Binary 


35 = 00100011 


(ЖӘН of 18: 1х1 
| Number of 2s: 1x2 


= 1 

= 2 

Number of 4s: 0х4 = 0 

| Number of 8s: Ux8 = O 
Number of 16s: 0x16 = O 

Number of 32s: 1x16 = 32 

35 


Each 2x the value of the one 
to its right 


Number of 1s: 5x1 = 5 
Number of 10s: 3x10 = 30 


Each 10x the value of the one 
to its right 


FIG - 4 
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EXAMPLES OF BINARY (DIGITAL) ADDITION: 


IN-RANGE 
DECIMAL 8-ВІТ BINARY 
20 00010100 
+35 +00010011 
55 00110111  *—— 8-bit number - ОК 
B OUT OF RANGE 
DECIMAL 8-ВІТ BINARY 
230 11100110 
+35 +00010011 
9-bit number - 
265 100001001 Є J imit to 11111111 
255 r 
8-bit number - OK 


FIG - 8 
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MORE ON BINARY FRACTIONS: 


MULTIPLICATION 
| 
DECIMAL 16-BIT BINARY 
35 x 187 = 6545 = 0001100110010001 
35 x 187/128 = 6545/128 
\ = 51 + 17/128 = 000110011.0010001 
A 
Gain of 1.46 (approx) ж 
| 51 
Binary Point 
17 (/128 
1 
DECIMAL 16-BIT BINARY 
255 x 255 = 65025 = 1111111000000001 
(255 x 255)/128 = 508 + 1/128 = 111111100.0000001 
508 
/ Binary Point 
Requires limiting to 1 (/128) 


11111111 (2255) 


А 
22 


FIG - 11 
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MORE ON BINARY FRACTIONS 


8-bit video in the range 0 to 255 
Video level 97 = 01100001 
actually 01100001. 


\ 


Binary point 


8-bit video in the range 0 to 1 


B Video level 97/255 = 01100001 
actually .01100001 


Binary point 


The position of the “binary” point is often conceptual and may not be 
apparent in the hardware design. 


FIG - 12 
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Gain 


Gain 


Linear (2-point) Interpolator Frequency Response (half pixel 
| scroll) 


| — Linear Interpolator | 
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FIG - 33 


Quadratic (4-point) Interpolator Frequency Responses (half pixel 
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FIG - 34 
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CCIR 601 LUMINANCE DIGITISING 


OVERSHOOT SPACE 
possc zc SNS cC НО асови = CF Е 254 
WHITE 295 
UNDERSHOOT SPACE ~ 
CCIR 601 CHROMINANCE DIGITISING 
(Cb OR Cr) 
OVERSHOOT SPACE 
254 
240 
| +850mV PEAK POSITIVE 
128 
GREY 
_350mV eo ШЕ 


1 
UNDERSHOOT SPACE 
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FIG 5 


FIG 6 


: DIGITAL 


FIG7 


‘BRIGHTNESS’ ADDER 


ШОРО 8-BITS 
DE 
LIMITER | ra 77 


] 


8-ВІТ5 | VIDEO / 
7 


OFFSET 


DETECTS SUMS 
GREATER THAN 255 
AND SUBSTITUTES 255 


FIG 9 


‘CONTRAST’ MULTIPLIER 


16-BIT 
PRODUCT 


MULTIPLIER 


ROUNDING 
ADDER 


8-BITS 

VIDEO 
8-BITS 

GAIN 


DETECTS SUMS 
GREATER THAN 255 
AND SUBSTITUTES 255 


TRUNCATE 
BOTTOM 7 
BITS 


DIGITAL MIXER / KEYER FIG 13 


MULTIPLIER 


OUTPUT = (K x А) + ((I-K) x B) 


FIG 14 


ALTERNATIVE MIXER/KEYER 


+ 


SUBTRACTOR 


OUTPUT = K (A-B) +B 
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FIG 15 


INTERLACE 


= 


FIELD 1 


FIELD 2 


FIG 16 


A FRAME STORE AS A GRID OF CELLS 


Жатты 


720 PIXELS 
(ACCESSED BY THE HORIZONTAL ADDRESS) 


(SSAHGGV TVOILH3A JHL АЯ 03663999) 
SSNI1 926 
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——— OoN TV MONITOR T OT 


2-PORT STORE 


OUTPUT 
VIDEO 
VIDEO VIDEO = ç 
IN OUT теб 
WRITE READ 
ADDRESS ADDRESS 
| 
FROM WRITE FROM READ 
ADDRESS ADDRESS 
GENERATOR GENERATOR 
WRITE SIDE MACHINE 
SIMPLE RASTER READ 


мк 


WRITE IT WHERE YOU WANT IT 


INPUT PICTURE 


IMAGE IN THE 
FRAME STORE 


ON TV MONITOR 


“М ы, аа IT OUT WHEN YOU WANT IT 
oy 


SIMPLE RASTER WRITE 


—— 


READ SIDE MACHINE 
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Ec OUTPUT PICTURE 


FIG 17 


FIG 18 


WRITE-SIDE" MACHINE FIG 19 
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VIDEO VIDEO 
OUT 


WRITE READ 
ADDRESS ADDRESS 


ADDRESS 
CALCULATOR 


LINE 
COUNTER 


"READ-SIDE" MACHINE FIG 20 
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FRAME STORE 
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IN OUT 


WRITE READ 
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ADDRESS 
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ADDRESS GENERATOR FIG 21 
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ADDRESS 
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WRITE AND READ STORE ADDRESSING FIG 22 


WRITE-SIDE MACHINE "а"-2 EXPANDS 2X 


+ STORE LOCATIONS ) 
SUCESSIVE 
READ ADDRESSES 
INPUT IMAGE IS STRETCHED AS IT 
IS WRITTEN INTO THE FRAME STORE 
READ-SIDE MACHINE "а"-2 REDUCES 2X 
THIS SECTION OF INPUT IMAGE = 
И STORE LOCATIONS ) 


\ \ за SUCESSIVE 


: “ВЕАПр ADDRESSES 


| / OUTPUT 
SCREEN 
7187189181111 8085 
\ / 
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SIZE ON THE 
OUTPUT SCREEN 
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READ-SIDE MACHINE, 


CLOCKWISE Z ROTATION FIG 23 
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“НОШЕҮ” PICTURES oo 


SUCCESSIVE WRITE ADDRESSES: а = 2 
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NOT WRITTEN 


INTERPOLATING PIXELS IN A READ-SIDE MACHINE . FIG 26 
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BI-LINEAR INTERPOLATOR LOGIC FIG 30 
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SUCCESSIVE WRITE ADDRESSES: ‘a’ = 3 
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FIG 37 
TYPICAL READ-SIDE DVE 
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DIGITAL COMPRESSION 


Nick Lodge (NTL) 


P Picture Quality Assessment 
and Optimisation Methods for 
Advanced Television systems 


Бу N. K. Lodge* 


No designer of an image source, transmission, or display 
system will need reminding that their ultimate objective 15 to 
provide pictures which are subjectively optimal for the final 
human user. It is therefore obvious that methods must be 
employed, as part of the system development, which will 

atermine viewers? opinions of the subjective quality achieved 
dridlor the subjective nature of any picture impairment which 
may occur as a result of an external influence (e.g. channel 


noise). The designer will then wish to feed back the results of 


these measurements in a systematic way so that he or she ts able 
to improve the design by matching it more closely to the require- 
ments of the viewer and the application. It is a sad fact that 
subjective testing methods have not kept pace with advances in 


ver the years there has much care- 
ful experimentation to arrive at 
methodologies by which the opin- 
ions of observers on the quality or 
impairment of a television picture can be 
gathered. These specify the procedural and 
environmental conditions of the subjective 
v as closely as possible so that results 
ained on one system, on one occasion, 
with one set of observers and in one labo- 
ratory, will be comparable with results 
obtained elsewhere and at other times. The 
most recent state of this work is given in 
(1]. This Recommendation is updated at 
yearly intervals, but, with full published 
volumes appearing only every 4 years, it is 
hardly surprising that the methodologies 
have struggled to maintain their usefulness 
in the rapidly advancing world of television 
.echnology. The advent of the D1 digital 
VTR has probably given the biggest boost 
of recent years to the reliability of subjec- 
tive assessment methods, since it has 
ensured consistency of reproduction 
г test sessions and has allowed Ше 
é ation of libraries of internationally 
accepted test picture sequences. 


*Dr Nick Lodge is with the Independent Television 
Commission, Kings Worthy, Winchester SO23 
204. Dr Lodge received the BKSTS Dennis Wrat- 
ип Award іп 1993 for his article “Low Bit Rate 
Video Compression Techniques” published in Image 
Technology, December 1992. 


Imare Technoloev Sentemher 1994 


the television systems which they are being used to assess! 


Recently the broadcasting world has seen 
the development and standardisation of a 
large number of television picture process- 
ing systems intended for professional and 
domestic transmission applications, as well 
as for use within the studio. Examples of 
these are numerous: HD-MAC 34Mbivs 
inter-studio contribution codecs, PALplus, 
‘non-linear’ 
editing facilities, digital videotape 
machines, standards converters, noise 
reducers and slow motion interpolators. 
One feature that all these processes have in 
common is that they are adaptive, that is 
they alter their behaviour depending upon 
the content of the picture, or part of the 
picture, that they are handling at any par- 
ticular time. For efficient image transmis- 
sion or storage, processes which rely on the 
reduction of picture signal redundancy, 
there are good theoretical reasons why this 
adaptivity is essential. 

Redundancy reduction is not only 
concerned with the more obvious forms of 
statistical redundancy, such as the likely 
similarity between successive picture 
frames, but also with psychovisual redun- 
dancy - the exploitation of the inadequacy 
of the human visual system to perceive 
certain types of distortion against a back- 
ground of certain local picture content. 
Optimisation of these systems requires 
extensive use of specialised subjective test 
procedures. 


The presence of adaptivity in television 
processing means that attempts to relate 
objective television measurements to 
subjective quality are no longer useful 
because the nature of artefacts which can 
arise is so varied. Even employing exisung 
subjective assessment methods is not 
сосайу reliable, because adaptive systems 
usually exhibit scene-dependent quality 
and this raises the question of how a few 
ten-second (10s) picture sequences should 
be chosen to be representative of typical 
broadcast television content. 

More than every before there are strong 
commercial reasons for taking subjective 
measures seriously. Collaborative as well as 
competitive research programmes fre- 
quently organise subjective assessment 
campaigns to determine which of several 
proponent systems should be selected for 
further development or standardisation; 
the financial implications of losing these 
competitions can be considerable. Even 
after implementation, service operators will 
be faced with decisions about system trade- 
2ffs such as the quality vs quantity of 
television services which should be offered 
through a fixed capacity channel. Buyers of 
professional studio equipment too, may 
also organise subjective compsrisons 
between competing systems to assist in 
their choice. Failure to judge the require- 
ments of customers or to take account of 
the subjective performance of competing 
systems or services, could well spell com- 
mercial disaster. 


Design of Subjective 
Assessment Tests 


In an ideal world, psychologists and 
engineers would have published a single 
method for the reliable and sensitive, sub- 
jective assessment of any television system. 
Neither the application of the method nor 
the processing of the data, would require 
any specialist advice, and the results would 
be immediately representative of the opin- 
ions of normal viewers watching television. 
The reality today is somewhat different. 

Despite the ‘user friendliness’ of Rec 
500-4, which advises how to choose an 
appropriate method from a lirnited set of 
options, questions will still arise in the 


mind of the non-specialist: How should the 
€ eference’ picture condition be chosen in 

74 impairment test? Will revealing clues in 
the nature of certain distortions, invalidate 
the methodology? How should observers 
be briefed before each session? How many 


observers are necessary to ensure the statis- ` 


tical significance of the results? When is it 
appropriate to use an ‘anchor’? Remember 
too, that subjective assessment is expensive, 
not only in terms of observer time, but also 
in picture sequence preparation, videotape 
editing and statistical analysis. It is little 
wonder that system designers embark far 
too infrequently upon subjective evaluation 
to guide their work. 

Let us examine the key elements 
involved in subjective test design: 

Basic methodology Неге it must Бе con- 
sidered whether quality or impairment 
should be measured, or whether a com- 
-varison between two or more systems 

<< лоша be made, perhaps in order to rank 
their subjective performances. It will be 
necessary to decide if a single or double 
stimulus method is more appropriate and 
. how the ‘test’ and ‘reference’ (if used) 

conditions should be presented. For the 
benefit of the observers’ attentiveness, and 
therefore test validity, it is important to 
ensure that the test duration is not 
excessive. 

Viewing conditions So that results can 
be compared with, or contributed to, 
assessments performed elsewhere, it is 
essential to adopt standard conditions for 
room illumination, monitor set-up and the 
seating of observers with respect to the 

- monitor. 

Choice of observers It will be necessary 
to decide how observers should be 
screened (e.g. for visual acuity, colour 

lindness) prior to their participauon and 
` whether there are certain classes ot ‘expert 
viewers’ who should be excluded. 

Scaling method A key issue in the regis- 
tering of observers' votes is to decide which 
scale they should be recorded against. 
There are many scales from which to 
choose, each having particular advantages 
and disadvantages. 

Reference conditions The choice of a 'ref- 
erence’ picture condition for comparative 
or double-stimulus testing is not always a 
simple matter. What for example would 
one use in the subjective evaluation of a 
film to video transfer system? 

Presentation timing The pattern and 
duration of the presentation of picture 
sequences, and the period allowed for 
voting must be carefully determined and 15 
a compromise between: allowing sufficient 
‘ime for observers to make reliable judge- 
ments, but not so long that their memories 
of earlier conditions have faded, and not so 
long that the total session time becomes 
excessive. Also allowing sufficient repe- 
ation of test picture sequences that various 
factors under study can be explored, but 


not so much that observers become over- 
familiar with them. 

Test picture scenes This is a particularly 
important issue, which will be thoroughly 
considered later. The picture material used 
must be chosen scientifically, so that on the 
one hand it is demanding for the system 
under test, but on the other hand it should 
be understood just how гергезегиацуе of 
real television content it is. A set of scenes 
could easily be chosen, for example, having 
a predominance of colour transitions and 
high frequency stripes, to demonstrate that 
the PAL television system is completely 
unusable for broadcasting. 

Analysis of voting Obtaining the 
required information from the raw votes of 
the observers is, of course, essential but it is 
important to understand what processing 
may be validly done on the data and what 
limitations may be imposed by the basic 
methodology. It is possible, for example, to 
identify those observers who have been 
inattentive or confused, and remove their 
votes from the analysis? Remember too, 
that analysis of the results not only yields 
the mean opinion scores sought, but also 
variances and significance information 
which is vital in interpreting the results and 
ultimately judging how successful the 
evaluation has been. 

Results presentanon Choosing the most 
appropriate way in which to present the 
results of subjective evaluations is import- 
ant, both for ease of interpretation and for 
purposes of comparison between different 
experiments. It is currently being con- 
sidered whether the processing and presen- 
tation of the results of subjective 
evaluations should be a subject of stan- 
dardisation. 


Current Methodologies 


There are two methods which together 
account for the vast majority of subjective 
assessments performed today they are the 
double sümulus impairment scale (DSIS), 
and the double stimulus continuous quality 
scale (DSCQS) methods. There is no rigid 
advice on whether one should choose a 
‘quality’ or an ‘impairment’ test in a par- 
ticular situation. Generally, if an experi- 
menter 15 causing different amounts of 
degradation to a picture, and the relevant 
issue is the difference between the original 
and the distorted versions, then an impair- 
ment scale should be used. Where one or 
more systems are tested under normal 
operating conditions (no external distor- 
tions are incroduced), so that interest lies in 
the fidelity of the reproduced picture with 
respect to the source, then a quality scale 
should be used. Occasionally a sensitive 
differendation between pairs of systems is 
required. Here a comparison scale may be 
employed and can also prove useful for 
rank-ordering smail numbers of systems. 
The double stimulus impairment 


scale — This method uses a cyclic presen- 
tation where observers are first shown an 
unimpaired reference picture sequence (for 
~10s) and then the same scene subjected to 
the impairment under test (for ~10s). They 
are informed which of the pictures is the 
reference and which the test condition, and 
then asked to vote on the second, keeping 
the first in mind. Throughout the session, 
which should last no longer than 2 
minutes, all impairments of interest 
shown to the observers in а random ог... | 
of combinations covering all the test 
scenes. This time permits 40 presentations 
to be made. The unimpaired picture is also 
included as one of the assessed conditions, 
and all presentations are repeated twice 
within the session. Care is taken to ensure 
that, although the presentation order 15 гап- 
dom, the same scene is never used on two 
successive occasions. 

Voting is on the basis of the observer’s 
‘overall impression’ of each test picture, 
and is recorded using one of five discrete 
impairment grades: Imperceptible, Per- 
ceptible but not annoying, Slightly annoying, 
Annoying, Very annoying. For processing, 
each grade is assigned the numerical repre- 
sentation 1-5, and results are presented by 
mean and standard deviation for each test 
parameter. 

The double stimulus continuous 
quality scale - This is another cyclic 
method employing pairs of picture 
sequences. One picture of the pair is 
directly from the source and the” 7 
results from the source signal after? ) 
passed through a system under tést 
(although the condition where both pic- 
tures are from the source is also used). 
These pictures are randomly designated A 
and В, and shown (for -10s each) to the 
observers, who are asked to grade both, 
with no knowledge as to which is the source 
and which the test picture. Throughout a 
session, which should be limited to 30 
minuyes, all combinations of systems under 
test and scenes are shown, not only in a 
random order, but are also twice, where the 
designations to A and B are reversed on the 
second occasion to eliminate bias. 

Voting is performed by the observer 
making a mark at an appropriate point on a 
continuous scale for both pictures A and B. 
Fig. 1 illustrates a portion of the voung 
form where the five adjectives: Excellent, 
Good, Fair, Poor, Bad are shown as a guide 
for the observer. Statistical processing of 
the presentations is usually based on the 
measured difference between the marks. ыг 
scales A and B expressed as a percent” | 
the scale length. The differences аге | 1 
re-converted to equivalent quality grades, 
and mean scores for the source and test 
conditions are presented for each combi- 
nation of variables. 

The use of adjectival descriptors has 
been criticised on two grounds. First, the 
perceptual intervals between the terms 
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Presentation: 1 


А B 


Excellent 


Fig. 1. Portion of the double stimulus continuous quality scale voting form. 


used are known not to be equal [2], with for 
example, poor and bad being perceived to 
be much closer than excellent and good. Sec- 
ondly, when translated into other languages 
thexerms give rise to yet different percep- 
c intervals, so that for example, mássig 


delivery. The consequence of this is that we 
may wish to take a commercial rather than 
professional view of picture quality: no 
longer shall we consider laboratory viewing 
conditions, but domestic ones instead; and 
no longer shall we aim to deliver exceilent 


and schlecht are further apart in the mind of quality across all types of picture material, 


a German than poor and bad are in that of a 
native English speaker. This has conse- 
quences for the validity of subsequent lin- 
ear statistical processing and international 
subjective evaluation campaigns. | 
Single Stimulus Methods – Are those 
in which no comparison with an unim- 
paired reference condition is invited during 
every presentation, ie only one picture 
: nce 18 shown each ame. These are not 
В“ as double stimulus methods but 
arc most often used whenever ап appro- 
priate reference sequence cannot. be 
derived. They do have the advantage that 
they are quicker to perform, but are known 
to be sensitive to the range and distribution 
of the conditions shown. This problem can 
be partly alleviated by including in the 
pr. .."ntation, ‘anchor’ sequences, which 
represent the most extreme condition dis- 
played. 


\ 


but shall ask such questions as ‘how much 
occasional distortion will the viewer 
tolerate before he will switch to another ser- 
vice?'. In ascertaining the worth to the 
viewer of picture quality, against say, 
number of available channels or waiting 
time for a near-video-on-demand movie, it 
may well be that we would wish to get away 
entirely from existing method and employ 
marketing-type comparisons along the 
lines: ‘would you swop two packets of 
Brand X for your regular soap powder? ` 

Let us briefly examine the faiLugs of 
existing methods in this new environment 
[5]: 

i) Test picture sequences are too 
short — Existing methods use sequences of 
10-15s duration, this is not only inade- 
quate to judge scene-dependent quality 
variation, but is also very different from real 
domestic viewing, where scenes are 


The definitions of standard viewing - watched in the context of a programme 


conditions for all methods are given in (1] 
for conventional TV, and [3] for HDTV. 
These references also advise on the choice 
of observers, as does [4] pp 45, 46. 


What is Wrong with 
Current Methodologies? 


For evaluating many established systems, 
the answer to this question is ‘nothing’, 
however recently two factors have com- 
ріегеіу changed the environment in which 
I fish to apply subjective evaluation 
ь._ 4043. The first is the advent of adaptive 
systems (eg digital compression), which 
giv^ rise to scene-dependent quality, as was 
nö -u in the introduction — existing meth- 
ods cannot meaningfully describe this 
behaviour. The second is that technological 
advances mean that adaptive systems are 
no longer solely the domain of the studio, 


lasting tens of minutes. 

ii) Choice of test material is unscien- 
tific — Choice of test material has usually 
been governed by politics, availability, and 
vague concepts, such as that it should be 
critical but not unduly so’ (CCIR Rec 


500-40. No attempts have been made to 
ensure that it .3 statistically representative 
of real TV content. 

iii) Comparisons are usually made 
with a ‘reference’ — In the usual double 
stimulus procedures, test pictures are com- 
pared with an unimpaired reference. While 
this results in sensitive laboratory evalu- 
ations, such side-by-side comparisons are 
not available to the domestic viewer, who 
would very probably not notice some 
occasional distortions. 

iv) Test picture sequences are 
repeated — Existing methods employ 
repetition of test sequences, so that 
observers have ume to seek out particular 
details in the scene which reveal distortion 
most readily. They then base subsequent 
judgements solely on these. IN real TV 
viewing, scenes are shown only once (or 
with sufficient timc in between to forget 
details), so intimate familiarity with scene 
content does not have time to develop. 

v) Viewing distances are short - Stud- 
ies of domestic viewing conditions consis- 
tently show that most viewers sit at a far 
greater distance from their screens than the 
existing 4H and 6H assessment standards. 
This means, for example, that some occa- 
sional resolution variation would probably 
not be noticed by the domestic viewer. 


Progress Towards a 
New Methodology 


Scene-dependent quality. The first step 
in developing a new methodology is to 
understand how the performance of 
advanced systems, which exhibit scene- 
dependent quality, can be characterised. 
Certainly the most common of these 
systems, and the one which will be concen- 
trated upon here, is digitally compressed 
television. The compression process, 
explained in the tutorial paper [6], operates 
by removing redundant information such 
as the similarity between adjacent frames 
and pixels, from the picture signal. A model 
of this is shown in Fig. 2. 

Notice that the system employs a control 
loop to ensure that the buffer store, which 


but are poised to be used in direct-to-home Fig. 2. Model of a digital television compression encoder. 
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matches the irregular redundancy-reduced 

qun entering it, to a constant rate channel, 
oes not overflow. It does this by intro- 
ducing into the transmitted picture, some 
degree of distortion which is tuned to the 
properties of the human visual system in 
such a way that for most television scenes it 
remains impercepuble. Occasionally, how- 
ever, very busy scenes will occur in normal 
television which require a large number of 
bits for their adequate reproduction, these 
will stress the buffer store and will conse- 
quently appear noticeably distorted ar the 
receiver. 

The first question which one might seek 
to answer then, is: given a particular com- 
pression method operating at a particular 
bit-rate, how often and to what degree will 
subjectively noticeable distortion occur on 
typical TV programmes? One way to 
answer this would be for a panel of 

,"Qoservers to vote on the quality of many 
K'aousands of test scenes which have passed 
through a compression system. The result- 
ing distribution of percentage occurence 
vs subjective quality could then describe 
the scene-dependent quality variation. 
Naturally this is impractical, but the same 
result can be approached by a more 
realistic means: 

Rather than employ the observers to 
grade all the scenes, a rank order can first 
be established according to the difficulty 
which a particular compression system 
would be experience when presented with 
each scene (measured by using the rate- 
buffer occupancy or a similar analytical 
approach). The observers then need only 
assess compressed versions of a represen- 
tative sample of the ranked scenes. This 
method was attempted in an experiment 
which employed 27,369 short scenes 

*corded automatically оп D1 tape over а 
- week, from 5 channels carrying component 
television. These were the former BSB 
channels: ‘Power Station’ (pop music 
videos), ‘Galaxy’ (general entertainment & 
news), ‘Sports Channel’, ‘Movie Channel’, 
and ‘Now’ (news & features). Not only did 
this set of scenes provide a wide range of 
material, but it also allowed analysis chan- 
nel by channel, to discover if some carried 
more demanding material than others. 

The resulting rank ordering was pro- 
duced in days using a large parallel pro- 
cessing computer which simulated the 
behaviour of a normalised motion- 
compensated hybrid DCT compression 
codec employing subjectively optimised 
quantisers (typical of MPEG-type 
systems). The cumulative distribution of 
ranked scenes itself, is a valuable tool since 
it permitted, for the first ume, a calibration 
ә be made of the CCIR/EBU standard 
library of test scenes in terms of typical 
television content. To do this, the standard 


scenes were subjected to the analysis | 


program, and their resulting *criticalities' 
marked (those not shown in parentheses) 


% Cumulative frequency (All 5 channels) 


Criticality (bits / pixel) 


Fig. 3. Cumulative criticality distribution for 27,369 television scenes. 


onto the cumulative distribution (Fig. 3). It 
proved to be a revelation that, despite the 
fact that these scenes had been very widely 
used in all international studies on digital 
compression, none was representative of 
the most critical 10% of scenes occurring in 
typical television — 6 minutes in every hour 
would look worse than anything that had 
been seen in evaluations performed using 
the library! The scenes shown in paren- 
theses were examples identified from the 
27,369-scene sample. 

In the frequency distribution of the same 
results (Fig. 4) some interesting character- 
istics are revealed, in particular the pres- 
ence of three distinct leaks. Evamination of 
the source material showed that the small 
left-hand one was due to the presence of 
captions, which being computer-origi- 
nated, exhibited very little noise and 
achieved low criticality scores. This peak 
was strongest in the ‘Sports Channel’ 


Number of television scenes 


Newsreader-: 
type scenes 


statistics. The second peak consisted of 
scenes of talking presenters and news- 
readers. These were characterised by a sull 
camera, no background movement, and 
were often live and so contained low noise 
levels. This peak was strongly evident in the 
channels ‘Now’ and ‘Galaxy’ which carried 
news, and was totally indiscernible in the 
‘Movie Channel’ statistics. It is perhaps 
most interesting because it is representative 
of the statistics of videoconference scenes, 
which had not previously been compared in 
the same graph with those of enterta jt 
television. The third major peak із. di- 
cative of the spread of criticality in general 
television scenes. Its character did not vary 
greatly across the channels, however, the 
‘Sports’ and ‘Power Station’ channels did 
exhibit a slighdy higher proportion of very 
critical scenes, with almost all of the most 
critical 1% of scenes being attributable to 
‘Power Station’. 


84% of all scenes ` | 
‚ grades 5-4 : 


ша of 2 scenes 
grades 4-3 


196 of all спо; 
grades 3-2 


1.5 2.5 


Entropy-based criticality measure (bits/pixel) 


Fiog. 4. Distribution showing scene-dependent subjective quality variation for MPEG1 + at 5 Mbit/s. 
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Subjective Calibration 


АС” .presentative sample of the ranked 
scenes was compressed то the bit-rates of 
interest in order to obtain subjective quality 
grades for them. Here the DSCQS method 
was used so that the results obtained could 
be interpreted in terms of the many docu-, 
men ed tests which have employed this 
m . In the assessments, three DCT- 
ba. dompression approaches were used, 
the results of one, the MPEG-1+ algorithm 
at 5Mbit/s, are shown in Fig. 4. This 
characterisation of variable quality revealed 
that 84% of scenes were reproduced in the 
‘excellent’ range (0-20% of the DSCQ 
scale), 15% were in the ‘good’ range 
(20%-40%), and 1% of scenes fell into the 
‘fair’ range (40%-60%). 

This measure is invaluable for compar- 
ing the performance of different systems 
since it is determined across a very wide 
22 де of different picture material, it 
streuld however, he interpreted with care, 
since statistics can be considerably different 
in whole atypical programmes. As an 
example of this, consider watching tele- 
vision through a system which does not 
reproduce well, thin black lines moving 
against a white background. While for most 
programmes this may not be a severe 
handicap, imagine trying to enjoy a half- 
hour slalom skiing contest, where the poles 
defining the gates not only become worst- 


/ West, an ex RAF Pilot with over 20 years service, took part in the 


case features, but also demand the visual 
attention of the viewer. Do not forget too 
television advertisements, which employ 
dynamic presentations and rapid scene 
changes in order to attract and interest the 
viewer. These may also be broadcast 
frequently enough for viewers to become 
familiar with them. They are likely to be 
more critical than typical television, how- 
ever, since they represent considerable 
invesumerit on the part of the advertiser and 
income on the part of the broadcaster, both 
are likely to be intolerant of any artifact 
whatever. 

It is also worth pointing out that some of 
the scenes discovered in the large sample 
(eg Door and Supermarket) are extremely 
critical because they contain an unusually 
high level of source noise, which to a 
compression system appears as a high 
source information content. Such scenes 
are however, not suitable as reference 
sequences in double stimulus tests because 
after compression they do not look signifi- 
cantly different from before, even though 
they will actually have undergone consider- 
able distortion. The A-B difference 
measure in the DSCQS assessment for 
such scenes, will therefore be very small. 


Viewer Tolerance to 
Quality Variation 


So far we have achieved a characterisation 


of the provortion of television scenes which 
are likely to exhibit distortion, but do not 
know how tolerant the viewer will be of 
this. Clearly the durations of presentations 
in existing subjective assessment methods 
will be too short for meaningful judge- 
ments to be made. The best approach to 
determining this in a commercial environ- 
ment is to employ a method which involves 
exposing viewers to entire programmes 
which have passed through a system under 
test. This is now becoming a practical 
proposition because flexible prototype 
hardware is more readily constructible, and 
even where it is not, simulated impairment 
of a small proportion of scenes can be per- 
formed by computer. 

Such a method will introduce a range of 
distractions, which although typical of 
domestic viewing, may make it difficult to 
measure the influence of picture quality 
alone. Studies of some of the influences are 
in progress and are aiming at answering 
such questions as: to what extent will the 
viewer excuse occasional poor scenes when 
the vast majority have excellent quality?; 
will the viewer’s final and overall impres- 
sion of programme quality be based largely 
upon the quality of scenes shown towards 
the end of the programme?; and to what 
extent will interest in parts of the pro- 
gramme content influence the annoyance 
of distortion? The method used for record- 
ing viewers’ opinions will form a vital 
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needs, back with his RAF family. 
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Fig. 5. System for the subjective and objective characterisation of picture quality in compression 


лете. 

“ғ | 
element of a new methodology. Опе 
approach is to give each a continuously 
variable indicator knob, with which they 
can register the level of their dissatisfaction 
whenever the picture quality declines. This 
could perhaps be scaled with respect to 
some overall rating given by each viewer at 
the end of the programme. Methods such 
as this have been used for many years in 
continuously assessing the enjoyment ог 
dissatisfaction of viewers with the content 
of pilot programmes and commercials. 

Fig. 5 shows a flexible arrangement for 
conducting subjective assessment and 
optimisation processes. Play-out of source 
material, which could be in the form of a 
complete television programme or a large 
number of short sequences, is controlled by 
a small personal computer. As the tape 

‘ays, parameters of the system, such as the 

„„осмрапсу с: the rate-buffer in a digital 
coder, can be logged against ame-code to 
enable the compilation of the system's 
statistical response to the input scenes. The 
system also has the capability to record the 
continuous or discrete opinions of a 
number of viewers. The statistical process- 
ing and relating of these data can then be 
performed rapidly to produce the required 
characterisations and performance meas- 
urements of the system under test. After 
the test, the response of one or all observers 
can be displayed on-screen as the tape is 
replayed, in order to discuss or review the 
session. 

The material being played from the tape 
can, of course, be chosen from a particular 
class of material (eg sporting events or pop 
music ‘videos’) or can contain pictures 
which have already been passed through 

_ cher potential sources of distortion such as 
34Mbivs contribution systems or поп- 
linear editing suits. This allows assessments 
to be carried out in the context of a pract- 
cal studio environment where cascading of 
systems will certainly occur. 


The use of a tool such as this is not only 
important for investigating and opumising 
system performance in the laboratory, but 
it also provides a convenient means of 
permitting the specification of the ade- 
quacy of systems for use in the studio. 
Already, digital editing systems, originally 
designed for off-line working, are being 
claimed by their manufacturers to be suit- 
able for on-line application and to-date the 
quality compromise involved is not under- 
stood. 


Subjective Optimisation 


We have looked at how subjective assess- 
ment methods аге piogressing to cope with 
commercial demands and advancing tech- 
nology. So far we have not explored how a 
system designer can make use of the results 
from the assessments to optimise a design. 
It is difficult to generalise about this, since 
it is often the case that specialised 


% diff of DSCQ Scale 


Criticallty 


approaches have to be designed to find 
optimal values of certain parameters - for 
many of these approaches the flexible 
arrangement of Fig. 5 will be both applic- 
able and very efficient. 

A requirement of most designers is to 
find test scenes which are demanding for 
their systems, so that they have convenient 
material on which to concentrate. In the 
case of digital compression, it is here th 
occasional violation of an assumption m: | 
in the approach of the previous sectiv.. | 
proves to be useful. 

In section 5, a rank-ordering of a large 
number of scenes was described according 
to the occupancy of the encoder rate- 
buffer, and this was assumed to be the same 
as the rank-ordering of the subjective 
quality of compressed scenes which would 
be experienced by a viewer. This ordered 
sequence was then ‘calibrated’ by some 
subjective tests. Referring to the model of 
Fig. 2, where the buffer fill-level directly 
controls distortion, this assumption must 
be true by definition provided that the 
encoder is perfectly subjectively opumised. 
Some scenes, however, will appear subjec- 
tively worse than they should do, when 
compared with the majority of other scenes 
having the same level of criticality (rate- 
buffer occupancy), and it is the worst of 
these scenes which provide a powerful 
pointer to the inadequacy of the subjective 
optimisation of a system. Poorly subjec- 
tively optimised scenes can easily appear 
worse than scenes which have a ш” Со 
higher statistical criticality meas. | 
Remember too that there is much evidence 
that viewers in subjective assessment ses- 
sions judge the quality of a picture by its 
worse noticeable part, so local image 
characteristics which reveal deficiencies in 
subjective optimisation will have a larger 
impact than might be expected. 

Another important point here is that 
poorly subjectively optimised scenes can 
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Fig. 6. Identification of scenes which reveal deficiencies іп the subjective optimisation of compres- 


sion systems. 
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appear more frequendy and/or look worse 


„Жап scenes which have a much higher 
“..dtistical criticality measure. Simply 


searching for statistically busy scenes on 
which to optimise compression algorithms 
is not therefore the best strategy for 
improving performance. 

Fig. 6 presents an illustrative plot of 


RSCQS subjective quality vs criticality, 


\ch shows how scenes which reveal poor 
__Зјесчуе optimisation can be identified as 
those which excurse to the greatest extent 
from the average. Points obtained from the 
study described in section 5 have been 
marked and reveal the scenes Renata and 
Mobile & Calendar to exhibit poor subjec- 
tive optimisation. Interestingly, these 
scenes are well established in the folklore of 
compression as being ‘difficult’, but it has 
previously assumed that they are statisti- 
cally demanding or critical, which is not 
fspecially the case. The method of plotting 


въ $CQS subjective quality against criti- 


cality also provides a means of differenti- 
ating those source scenes which are noisy. 
The arrangement of Fig. 5 can conve- 
niently be used to identify scenes which are 
poorly subjectively optimised. In this case 
the tape would contain test sequences 
under evaluation, avoiding those with a 
high noise content. Once found, it is the job 
of the system designer to identify which 
characteristic of the image is embarrassing 


the system, and to devise a scheme to 
ЯВ 5 its reproduction. In image com- 


р. хо 


pression, subjective optimisation is соп- 
cerned with achieving the most appropriate 
allocation of bit-capacity throughout the 
picture so that for a given rate-buffer 
occupancy, subjective quality is constant. 
Picture characteristics which are not 
handled well can usually be improved by 
deriving some function to detect them, and 
then by adaptively varying the distortion 
strategy (eg by selecting finer quantiser) in 
their locality. 


Conclusions 


This paper began by reviewing the most 
common current methods for the subjec- 
tive assessment of television pictures and 
discussed their inadequacy to describe 
meaningfully the performance of advanced 
television systems. In particular it exam- 
ined the important case of digitally com- 
pressed television, where quality will not be 
constant, but will to some extent be depen- 
dent upon the content of the scene being 
handled at a particular time. A method for 
characterising the subjective quality of such 
systems was presented and demonstrated 
using an implementation of the MPEG1+ 
system. The commercial importance of 
understanding the viewer’s tolerance to 
occasional distortion was also discussed, 
and possible new approaches for investi- 
gating this using contnuous voting pro- 
cedures, were described. Finally, a method 


was described for identifying those scenes 
which reveal deficiencies in the subjective 
optimisation of compression schemes. Tar- 
geüng work on the optimisation of these 
Scenes, through the careful design of bit- 
allocation processes, is considered to be 
particularly important for improving com- 
pression systems because poorly optimised 
scenes are likely to occur more frequently 
in typical television, than scenes which 
exhibit high statistical criticality. = 
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Low Bit-Rate Video 


Compression Techniques 


by N. К. Lodge: 


[t 15 suprising, if we think of such advances as digital image transmission 
to be innovations of perhaps the last twenty or thirty years, to learn that 
the first digital picture to travel across the Atlantic between London and 
Halifax, Nova Scotia was sent by cable in 1922 [1]. The scanning and 
reproduction of the two-level photograph were performed automatically 
but off-line, with punched paper tape being manipulated by the operators 
at both ends of the link. A regular service operated in this way, conveying 
pictures for the newspaper industry which by 1930 was capable of 
£ndling 15 grey-levels and used an offset sampling structure similar to 
that used in today’s HDTV transmission schemes. With а 4:2:2 digital 
television studio producing paper tape at a rate of 50 miles/sec, it is 
fortunate that we now have more efficient interfaces between picture 


sources and digital channels! 


ne factor which existed in the 1920s and 
is becoming increasingly important 
today is the economic demand to make 
the most efficient use of available chan- 
пе ос storage capacity. Їл the transmission of 
jon this means that we are interested either 
veying more television services through a 
limited channel, or conveying the same number 
but with higher resolution. [t is very likely chat, 
even in the fucure where optical fibre offers the 
promise of cheap transmission bandwidth, there 
will remain an important place for bandwidth- 
etficient image coding techniques in more res- 
tncted media such as the RF emission and all 
currently conceivable storage mechanisms. 
- Thether employing analogue ос digital trans- 
trussion, systems which efficiently code television 
pictures operate by removing redundant infor- 
mation from the image signal prior to trans- 
mission and then reinserting it at the decoder. 
This redundancy manifests itself in two forms: 


Statistical redundancy - picture sample-values 
are not independent. but are correlated with 
their neighbours in the same line, the previous 
line and the previous frame. This means that 
the level of the signal any time is to some extent 

predictable from its past. 
Psychovisual redundancy — picture sample-values 
do not always have to be reproduced at the 
receiver with the fidelity with which they were 
represented at the encoder if the signal is 
destined for the human observer (and not for 
further studio processing, for example). This is 
because the human visual system exhibits some 
Jelerance to distortion, where the level of 
sance is dependent upon the nature of the 

: in the locality. 

Of course, as has been well understood since 
Shannon described the mathematical basis for 
munication, a price to paid for the efficiency 
са redundancy-ceduced transmission is increased 
suscepubility to channel distortions. This is as 
true of bit-crrors in digital transmission as it is of 
the wide class of impairments which can mar the 
reception of analogue pictures. A practical trans- 
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Mission system must therefore leave or reintro- 
duce some controlled redundancy in the signal. In 
the case of digital transmission this is most often 
in the form of forward error correcting coding. 
The subjective acceptability of the quality of 
pictures which have been received after redun- 
dancy-reduced transmission is perhaps the least 
understood of all the issues associated with this 
new technology. The reason for this is that over 
the last few years there has been a change in 
attitude to the quality of pictures which should be 
delivered to the home by future digital services. 
Previously the philosophy adopted was that 
digital services should always produce distortion- 
tree pictures even on the most demanding of scene 
content. To achieve this quality the capacity 
required from a constant bit-rate channel must be 
high enough to cope with worst-case pictures even 
though such pictures may occur in less than one 
per cent of typical television programming. A 
more commercial judgement is to accept some 
distortion on these rare pictures, and to trade the 
resultant bit-capacity saving for more resolution 
on the vast majority of pictures. To determine the 
opumum trade-off requires statistical analyses of 
the criticality of typical broadcast television and 
extensive subjective testing to determine the 


Fixed Data- 
rate PCM 


Structure of a Digital Image Coder. 


tolerance of the viewer to occasional visible 
distortion. 


A Model of a Digital Picture 
Coder 


Ц will be clear from the previous discussion that 
the behaviour of digitallv-coded images, differs 
significantly from analogue systems such as PAL 
or MAC. To a large extent the subjective 
pertormance of analogue picture transmission can 
be predicted from the noise and bandwidth of the 
channel. There are even graphs in the CCIR 
publications wnich help us to do this. In the case 
of redundancy-reduced digital tranmission how- 
ever we must really regard the channel as being 
information limited rather than bandwidth limited. 
This means that after redundancy has been 
removed from images of a particular scene, the 
coding system addresses the question of whether 
the resulting average bit-rate is less than that of 
the channel. If it is then distortion-free trans- 
mission is possible for that scene, if it is not chen 
some distortion (e.g. quantisation) must be intro- 
duced to lower the information, and it is one skill 
of the system designer to ensure that the distor- 
tion causes minimum subjective annoyance to the 
viewer. 

The figure below shows a model of all current 
digital image coders.- A method. of statistical 
redundancy removal is first applied to the incom- 
ing picture samples. This will result in a data 
output which is variable because some parts of the 
scene will contain more redundant informauon 
than others. A plain blue sky for example, is very 
redundant and can be represented by many fewer 
bits than say, a cornfield waving in the wind 
beneath it. Since a variable data-rate is not suited 
to the fixed-rate channel, a buffer store is 
provided (typically holding between one and two 
frames of coded data) to average the capacity. 
Notice that this allows us to have a variation in 
bit-allocation across the picture so that we can 
“save” bits on the coding of the sky, and “spend” 
them on the cornfield where they are needed 
most. If the contents of the buffer begin to rise 
then this is an indication that the scene being 


Buffer 
Fill-level 


coded contains too much information for the 
c^ -nel, so the feedback mechanism acts to 
іш uduce some of the subjectively-tailored distor- 
uon, which in tum, acts to reduce the data 
entering the buffer. The important point 10 
remember from this is that critical scenes, those 
which are very active over the whole screen, will 
suffer more distortion than non-critical scenes. 
This scene-dependent quality is a function of 
redundancy-ceduced transmission and does not 
generally afflict other image coding methods 
‘although due to the occurrence of cross-etfects, 
PAL has scene-dependent quality, and strenuous 
attempts have been made over many years to 
remove it!). It is therefore meaningless to make 
the comparisons, so often written about in the 
technical press, that digital picture quality at 
some fixed bit-rate is equivalent to that offered by 
an analogue system, such as VHS for example - 
one must ask on which material the comparison 
was made. 

[t is worth noting that in hybrid analogue/digi- 
tal coding, of which the HD-MAC aigonthm is a 
well-developed example, there is a mixture of the 
information-limited and bandwidth-limited types 
transmision. The redundant, but rugged and 
y; patible analogue part, exhibits the character- 
8555 of an analogue sytem, while the digital 
motion vector assistance channel which is part of 
the high definition redundancy-reduction system, 
gives a scene-dependent distortion characteristic. 


Redundancy-reduction 
Picture Coding using the 
ОСТ 


There are many techniques in use and under | 


study for the redundancy-reduction coding of 
visual services,ranging in application from video- 
phone at about 46kbit/s to contribution quality 
HDTV at 140Mbivs. Almost universally these 
systems are emploving a technique known as 
motion-compensated hybrid discrete cosine trans- 
(огт coding. There is however a host of variations 
on this basic theme, some empioy: different 
processing block sizes; different quantisers; spa- 
tial interpolauon; temporal interpolation: differ- 
: variable length codes: and different adaptation 
wera. 

In this section the principles are illustrated by 
an easy-to-follow numerical example which con- 
centrates on some of the basic ideas while 
avoiding the detailed complexities of full system 
specifications. Those requiring a more thorough 
treaument should consult (2] or (3]. The example 
chosen is realistic however and takes the reader 
from a block of samples of a real image, all the 
way to the redundancy-reduced digital гергезеп- 
tation that is conveyed through the channel. We 
begin by considering intra-field coding where no 
use is made of the information in previous frames. 
This is similar to the schemes being proposed for 
sull picture transmission by ISDN ог level-5 
teletext. 


Intrafield DCT-based coding 


(i) Divide the picture into blocks 


The fundamental element of this form of coding is 
‚+ block of samples, most commonly it is square 
‘aed of dimension 8 x 8. For illustrative purposes 

here a smaller block size of 6 x 6 has been chosen. 

In a practical hardware picture coder the blocks 

would be formatted trom the incoming raster- 
scanned picture by the use of line stores. The 

following із one block of source samples trom an 
image: 
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(ii) Take 
Transform 


the 2-dimensional Discrete Cosine 
In this stage we express the 36 samples in terms of 
36 coefficients of the DCT. The coefficients can 
be thought of as weights which when multiplied 
by their own characteristic 6 X 6 pictorial patterns 
and summed, gives the original picture. Thus the 
action of the DCT is to express the original 
picture block as a two-dimensional series in terms 
of this set of orthogonal characteristic patterns. 
This is close to the operation of the two- 
dimensional discrete Fourier transform except 
that here the resulting coefficients are all real. 


Mathematically, the forward transform 15 
expressed: 
«5 
T 
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Where: x(iJ) are samples іп the image domain 

6 x 6 block 


ХҮ) are coefficients of the 6 х 6 
transformed block 
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for w = 0 
for w = 0 


So for our sample block the DCT coefficients are: 


Notice that the distribution of coefficients in the 
transformed block is far from uniform. This is 
because the transform concentrates the energy 
into the top left-hand coefficients which tend to 
represent the lower frequencies in the original 
sample block. The top left-hand coefficient itself 
represents the de component of the block (it is 
actually twice the mean). The bottom right-hand 
quadrant has hardly any significant coefficients at 
all. The goal of the coding scheme is to convey to 
the receiver a representation of this transformed 
block so that it can perform the inverse transform 
to reconstruct the picture (or a close approxama- 
поп of it). The DCT has served to highlight the 
redundancy in the data by virtue of the distribu- 
tion of coefficients seen here which is typical of 
natural images. We now have to consider how we 
might exploit (his redundancy to derive за 
efficient binary description of this block (ос 
transmission. 


(ui) Thresholding 
A simple technique which is sometimes applied as 


"а pre-process is to discard any coefficients falling 


below some threshold, the assumption being that 
they do not make a significant contribution to the 
image. For the purposes of this example let us 
discard any coefficient smaller than 2, i.e. 


If X (k,) < 2 then X(k,l) = 0 for all АД. 


The redundancy is now very clear in three ways: 
the high proportion of 08; the clustering of 
significant coefficients towards the top-left of the 
block; and the relatively small values of the 
significant coefficients, given that the magnitude 
of each could extend to well over 200 in value (the 
dc term is always a special case, being more 
sensitive and having a very different statistical 
distribution to the other coefficients for natural 
images). 


(iv) Quantisauon 


It has been observed experimentally that it is not 
necessary to convey to the receiver the full 
numerical precision of the DCT coefficients to 
achieve excellent quality reproduction, so the 
range of possible values which must be accommo- 
dated in the coding can be reduced by the proms. 
of quantisation. In fact the human viewer 
more tolerant of quantisation noise in the. | 
frequency (strictly: sequency) coefficients towards 
the bottom right of the block. Practical quantisa- 
tion schemes take this into account by defining a 
separate quantisation law for each coefficient. The 
quantiser in a real-time video encoder will also be 
controlled by feedback trom the rate-buffer as we 
have already seen. 

For the purposes of our example let us assume a 
simple linear quantiser with a step size of 3, which 
allows only output values of 0, +3, +6, +9,... 
Quantisation of the de term can lead to the 
visibility of blockiness in the received picture, so 
it is usual to preserve about 10-bit accuracy for 
this. The quantisation will have a significant effect 
in terms of information reduction but will trade 
this for inaccuracy or noise in the reconstructed 
picture. 3 


Г 
Ч 


(v) Block Scanning 


So far our block remains two-dimensional but will 
need to be serialised for transmission along 2 one- 
dimensional channel. In performing this scanning 
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2. 
А 


operation let us -xploit the compaction of signifi- 
cant coefficients towards the top-left by perform- 
А zig-zag scan. Practical coders use a fixed- 
Ж scan which is statistically optimised on real 
pictures. [t is less regular but only slightly more 
efficient than that used in our example: 


The scanned order is then: 


-а 


— 30, -21,-6,18,9,3,-6,9,0,0,0,3,0,0, 
0. 0, 0,3, EOB 

Notice that а covenient shorthand here is the 
device “EOB” oc End of Block, which represents 


Yet another very important coding tnck is one 
which recognises that small coefficient values are 
tar more likely to occur than large ones, and that 
short cun-lengths of zeros are more likely to occur 
than long ones. This is called variable word-length 
ойла (VLC) and consists of the allocation -to 


The reader can verify that a decoder is able to 
determine the beginning and end of codewords 
when conveyed through the channel by interpret- 
ing the following binary sequence: 


011011011110001 which decodes into: 01101, 1, 
0111, 1, 0001 or: 15, 3, 9, 3, EOB 


Notice that that the prefix property has allowed us 
to define a single prefix, 010, which represents the 
beginning of a run of zeros. Its precise length is 
then signalled by the suifix which is itself a set of 
variable word-length codes. 

The sign of the coefficients is appended to each 
variable length word, where it will be assumed 
that a | represents а -ve value and 0 a + че value. 
The dc term is represented as a fixed length +ve 
binary word. 

The assembled bits for the transmission of our 
coded block are shown below. It can be seen that 
73 bits are needed to represent the 36 original 
samples, so our representation is at only 2.03 
bits/sample. Below is shown the error resulting 
from subtracting the decoded sample values from 
the original ones. The mean square error 15 4.42 
(expressed in terms of the discrete levels of the 8- 
bit input samples) and the signal to noise ratio 


0100100001 100000 10000001 10011011001001110 
10001101110010011100100011 100001 


Bits representing the coded block. 


to predict the sample values in the block, from the 
contents of the previous frame. The simplest form 
of such prediction is that which takes the co-sited 
block from the previous frame as the prediction. 
Naturally this works well in static picture areas 
but not in moving ones. A more efficient ap- 
proach is to offset any motion which has occurred 
berween the current block and previous frame, 
and to use a shifted block from the previous frame 
as the prediction. This method is termed motion- 
compensated prediction. 

In order to find the displacement which has 
occurred between the current block and its most 
appropriate match in the previous frame, a 
technique called block matching is usually 
applied. [t is a simple scheme and is illustrated 
below, where the block of 2N х 2N samples (N is 
3 here) from the current frame л, Xij), is 
compared with every possible location in a search 
area of 2M x 2M samples from the previous frame 
п-1. (Лі, Aj) denotes the value of the matching 
function at the displacement Ai, Aj. The values 
Ai, Ај (lying on the range М-М < А.Д) = 
M —N) which correspond to the best match 1.6. 
minimum D) are taken to represent the displace- 
ment vector for that block. 


Ееее or nearly 50% of the block. This same "E > я 
ftoach is used in the coding for transmission _ Dri, A) = 
: | 72 || 01001000 6 || 0011 ОР А р | 

where, as will be seen, we reserve а special EOB - 30 || 011000001 9 | 01110 | == xi) — Хү-4(150318:40 | 
code: ‘ord. This is possibly the most significant —21 || 00000011 000 || 010011 а 
information coding trick which is used. | -6 || 0011 3 || 10 The displacement vector for each block is con- 

Notice also that, as a result of the scanning, a 18 || 0110010 00000 || 0100011 veyed to the decoder along with the DCT-coded 
number of runs of successive zeros have occurred. 9 || 01110 3 || 10 prediction error so tht it can form the same 
А set of special codewords, each representing a 3 1110 EOB !! 0001 motion-compensated prediction as the encoder, 
run-length of zeros will also be an efficient coding | | . and can add back the prediction error to recover 
trick. The transmitted representation is then: 


the transmitted picture. 


(For Illustration of Block Matching see next 
page.) 


The block below was obtained as the motion- 
compensated prediction error for the samples in 


М alues, binary codewords which һауе differ- the previous example: 
X igths depending upon the probability with 
Wruch they are expected to occur. A small 


coetficient value, for example, is more likely and 
will be assigned a shorter representative binary 
code than a large coefficient value. 


vij Variable Length Coding 


Nat just any set of binarv codewords of differing 
ths will serve the purpose of our coding 


gh arn EES ПЕН ea TA Pe ee 


scheme. The set must have the property that 
when the words are sent through the channel in a 
long string of bits, the decoder can determine 
where one codeword ends and the next begins. 
This property is most conveniently achieved by 
ensuring that no complete codeword in the set of 


The Coding Error. 


possible codewords, is a prefix of any other. Motion-compensated 

Methods also exist for the optimisation of codew- i 2 : : DCT 

ord lengths to suit the statistics of the items of interframe OCT based ® Compunne ine m : Е 
information (symbols) which it is wished to code. coding Notice that the distribution of DCT coefficients is 


similar to that obtained before, being clustered 


For our purposes the tollowing codeword set has 


been derived: 


This coding employs essentially the same methods 
as the intrafield coding except that the basic 
coding block is not formed from picture samples 


towards the top-left. The dc coefficient however is 
now equivalent to (twice) the difference in means 
between the block and its prediction, it сап 


but instead is the error resulting from an attempt therefore take-on -ve as well as +ve values: 


Run Length 
of Zeros 


001 
0111 
00001 
01101 
011001 
0000001 


RL Prefix 
ЕОВ 


010.... 
0001 
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The Principle of Block Matching. 


‘ii The Other Coding Operations 


Rather than repeat all the coding operations, the 
coded block data is given below. The dc term has 
a variable length coded value in the interframe 
fing because its statistics differ from that of the 
field case. The variable length code set 
assigned to the dc term does not have to fit in as a 
subset of the coefficient code beacuse the decoder 
knows that the dc term will be the first to be 
transmitted and that it will only be sent once. 


0 || 00 0000000 || 01010010 
0000 || 0100101 6 | 0010 
3 || 10 3 || 10 
0 | 01011 00 || 010101 
-3 || 11 3 | 10 
-3 1 11 ЕОВ 1 0001 


The total number of bits is 46 giving an average 
rate of 1.28 bi/sample but to this must be added 
the small overhead of the motion displacement 
vector. The error resulting from decoding the 
transmitted block and subtracting from the origi- 
nal differential samples, has a mean square error 
of 2.14 and a signal to noise ratio of -H.3dB. 


" Adaptive 
'sarframe/intrafield coding 


In most recent sophisticated coders a decision is 
made on a block-by-block basis whether to 
employ mouon-compensated interírame coding or 
intratield coding and a single control bit is 
conveyed between encoder and decoder to indi- 
cate the mode in use. More often the interframe 
mode proves better and is chosen, but occasion- 
ally, because motion within the scene is erratic ог 
some unpredictable background has suddenly 
been revealed, the intrafield option results in а 
lower volume of data: for the block. 

Because it does not rely оп reconstructed 
previous frames, the intrafield mode is delibera- 
telv chosen for a number of blocks in each field, 
on à periodic basis, as a means of flushing-out any 
persistent error effects which would otherwise 
propagate from frame-to-frame (because of the 
intertrame prediction). This process is known as 
refreshing and might typically have a period of 1 
second, during which every block position in the 
picture will have been coded in the intrafield 
mode. 


` Multiplexing and Error 
protection 


The redundancy reduction coding is only pert of 
the processing which goes on in a full encoder. 


The coded data has to be 
multiplexed together in a 
regular arrangement 
with the synchronisation 
words, mode control 
bits, motion vectors, 
buffer occupancy infor- 
mation, audio, teletext, 
test data, and so on. 

The entire package of 
data is then protected 
with a forward error cor- 
recting code. Many cur- 
rent coding systems use 
Чаа interleaving and 
large Reed-Solomon 
codes for this purpose 
because of their ability to 
correct error-bursts 
which are a characteris- 
tic of many types of 
channel. Some errors 
which cannot be corrected are detected by the 
picture decoding algorithm because they give rise 
to anomalous reconstruction, such as picture 
samples which lie outside of the source number 
range, or too many samples in a block. In these 
cases the decoder сап шу to conceal them by 
putting in their place samples from the previous 
frame derived as a best guess. This can signifi- 
cantly help the subjective performance of the 
decoder when it is subject to excessive channel 
errors. 


Search area in 
previous frame 


Block of samples 
in current frame 


The Concept of Hierarchical 
Picture Coding 


There is much interest in the concept that a single 
transmission format could be decodable into 
pictures of more than one level of resolution. In 
the terrestrial broadcasting of digital television, 
for example, it is desirable that the signal be as 
acceptable for reception on small-screen batterv 
operated portables as it is on large-screen instal- 


lations. This means that ideally, the decoder . 
complexity required for small-screen resolution 


should be simpler than that required to reproduce 
an HDTV picture. This multi-resolution (and in 


practice 2-levels might be adequate) requirement - 


places particular constraints on the methods of 
low bit-rate coding and multiplexing used. The 
figure below shows an example of a hierarchical 
decoding environment [4]. 


A Hierarchy of Television Systems. 


Notice that a further consideration is that 
transcoding be possible to the coded digital 
format emploved in future digital storage media 
such as a domestic VCR. If it is possible to 
achieve, it has the benefit that quality loss due to 
the cascading of a full decoding operation with à 
subsequent VCR-optimised digital coder could be 
avoided. 

Currendy —"ch research work іп low bit-rate 
coding of HDTV is aimed at sub-band coding 


methods, where the. two-dimensional picture 
resolution is split by à bank of filters into a 
number of bands and each is individually coded 
(see for example [5] and other papers from the 
same conference). This does appear to have 
advantages for hierarchical decoding since, in 
principle, it is only necessary to decode the 
appropriate sub-bands to reconstruct pictures 
having various resolutions, as illustrated below. 


The Basic Principle of Sut band Hierarchical 
Decoding. 


The practice is however less simple than 
portrayed since picture interlace and the emer- 
gence of aliasing effects have to be dealt with. 
Hierarchical schemes are not restricted to sub- 
band approaches, techniques based on the DCT 
have been proposed as well as methods employing 
a hybrid of both DCT and sub-band coding. 


Quality Issues in Low Bit- 
rate Television 


In section 2 the scene-dependent quality of low 
bit-rate digital television was discussed with 
reference to the model shown in the first figure. 
This fundamental property of digital coding with 
fixed bit-rate transmission, means that цай” — 
subjective methods for assessing the quali ү 
svstem are not adequate - by choosing the «st 
picture material appropriately we could achieve 
any desired outcome from such tests. 

ideally we would like to employ a репе! of 
viewers to grade the quality of many thousands of 
scenes taken from the ouput of a (component) 
television studio after they have passed through a 
low bit-rate codec. This would give a ranking of 
the scenes as shown in the left-hand figure below, 
where for example, only 
1 per cent of all the 
material might be 
classed as objectionable. 
Unfortunately such an 
approach would Бе 
impractical, but a similar 
result can be obtained 
using 2 (wo-stage рсо- 
cess. 
The first stage relies 
on a statistical analysis of 
the thousands of scenes 
by computer, to deter- 
mine how critical they 
would appear to a sub- 
jectively optimised cod- 
ing algorithm. If the 
subjective optimisation 
is good, the ranked order 
of these scenes will correspond closely’ || 
which would have been arrived at by the р. 
viewers. The list of rank-ordered scenes müst 
then be calibrated according to recognised subjec- 
tive quality grades. This is done by taking a wide 
sample of scenes trom the ranking, passing them 
through a codec (or computer simulation), and 
performing subjective assessments оп this 
reduced set using established Rec. 500 method- 
ology, as illustrated in the right-hand figure 
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Service Quality by Subjective Assessment. 


below. The result from this stage is a characterisa- 
tion of the quality of the coding algorithm for a 
particular source resolution, with respect to 
typical television scene content. 

This characterisation represents only half the 
story however, since we do not vet know what 
level of tolerance viewers will exhibit to occasional 
pid scenes. The criterion which really 

ts here тау be the viewer's impression at the 

Га real programme based upon an overall 
appreciation and memory of the contents. Will 
this impression be of the vast majority of excellent 
quality scenes or of the infrequent distorted ones? 
Clearly a determination of this requires test 
procedures which are more typical of real tele- 
vision viewing than those of Кес. 500. A more 
thorough discussion of this is given in [6]. 

А computer analysis has been performed on 
over 27000 short picture sequences which were 
captured at intervals from the output of 5 
component 

“videos”, current affairs, movies, sport and 


gp, entertainment. Rank orderings and criú- 


! ‘istributions have been derived tor DCT- 
be ‘coding, both collectively and Гог each 
service. As might be expected the pop “video” 


channel exhibited the most critical scenes but 


characteristic variations are clearly visible in the 
statistics of the other channels. The sports 
channel statistics, for example, reveal the Іге- 


television services handling рор. 


Calibrating Rank-ordered Scenes Using a 
Representative Sample. 


quent use of full-frame results captions and the 
current affairs channel statistics have a very 
marked peak corresponding to the live, well-lit, 
still-camera newsreader scene. These correspond 
to typical videoconference scenes and provide an 
interesting statistical characterisation of the dis- 
tinction between videoconference material and 
entertainment television. 


Conclusions 


In this paper it has only been possible to scratch 
the surface of what is at present a major area of 
development and exploitation. The approach 
taken for the description of the principles was one 
of numerical example, which it is hoped, makes 
clear the underlying simplicity of the DCT coding 
approaches by the avoidance of a rigorous math- 
ematical treatment. Another redundancy reduc- 
tion technique, that of sub-band coding, was 
brietly mentioned in the context of the hierarchi- 
cal concept, which is intended io ensure that a 
transmitted digital signal is well-suited to a range 
of receiver environments and technologies. There 
are of course many other low bit-rate coding 
schemes which could not be considered here but 
none has achieved the widespread acceptance of 
DCT-based approaches, for which specialist inte- 
grated circuits are now readily available. 


The important issue of quality in low bit-rate 
television has been explained because its variable 
nature makes it very different from that of current 
systems. There will therefore be a need for new 
assessment methodologies which allow the subjec- 
tive characterisation of the performance of low 
bit-rate television coding. In particular it will be 
necessary to discover the levels of occasional 
distortion which will be tolerated by the viewer of 
typical television programmes, since it this is 
which will ultimately determine how much bit- 
rate reduction can be applied to broadcast ser- 
vices. о 
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The first section deals with the architecture and technology of modern computers. The 
second section goes on to look at some of the broadcast applications for computer based 
equipment. 


Architectures of Modern Computers 


Hardware 


_ For many people a computer is synonymous with the hardware, the size shape of the 


case, the keyboard, the monitor or screen. Though as we shall see, appearances can be 
deceptive. Just like cars, was is under the hood matters. 5 


The Basic Building Blocks 


Many people will be familiar with the basic building blocks that make up a computer. I 
don't doubt that most people here would be able to draw a pretty accurate diagram of a 
computer without missing a major component. 


Central Processor Unit (C, 0) 


Diagram 1. shows what would be called a classical (or Von Neumann) microprocessor 
architecture. Up until the last few years, this would generally have been what you find 
inside most microprocessors. In fact, with only a small amount of magnification the 
building blocks you can see on the drawing would actually be seen in the silicon. 


This CPU can only do one thing at a time. This architecture relies on the basic cycle of; 


_ fetch an instruction, process an instruction, manipulate the data and move onto to fetch 


the next instruction. The instructions are the most primitive functions possible on the 
CPU. They compzcise things like simply adding two numbers together, or testing to see 


_ ifa certain logical condition is true or false. Most users of computers never see (or want 


to see) these actual instructions. Even someone writing software normally does this 
using a high-level language which allows the user to program in a (hopefully) 
understandable syntax. The language interpreter or compiler converts this text into the 
low-level code which contains the CPU instructions. In fact executable software could 
just be thought of as one very long number. 

o9ica 
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Speed 


Most people tend to judge a computer by the speed of the CPL chip. This is misleading. 
for instance a 486 DX machine has a maths co-processor , whilst the 486 SX does not. 
The DX chip will run engineering and maths programs much faster. because they use 
lots of floating-point calculations, (requiring operations on numbers with decimal 
points in them). For ordinary word-processing. spreadsheets and databases both chips 
will perform identically. 


Doubling the speed of the CPU chip does not double the speed of a computer since the 
memory and I/O do not speed up. For ordinary programs a 66MHz processor can be 
expected to execute at 1.5 to 1.7 times the speed of a 33МН2 processor. 


At first, the specifications for different processors seem to quote a strange variety of 
CPU clock speeds. These may be 20,25.33,50,66 or 100MHz. The numbers become 
more meaningful if these numbers are expressed in terms of the length of time between 
clock cycles. These cycle times are then expressed in nanoseconds (ns). Hence:- 


20 25 33 50 66 100MHz 


50 40 30 20 15 10ns 


CISC versus RISC 


Most of the world's most popular and successful processors can be described as 
Complex Instruction Set Computers. The list of primitive commands that the processor 
is capable of understanding generally runs into hundreds. Along with some of these 
complex instructions comes the need for many tens of processor clock cycles in which 
to execute the various operations necessary to complete a single (albeit complex) 
instruction. Examples of CISC processors include the Intel 80x86 family and Pentium, 
the Motorola 680x0 and the DEC VAX processor. 


The idea behind RISC architecture, is that many of the complicated commands can 
actually be broken down into a small number of primitive functions. The complete list 
of the instructions that the processors can understand is then cut-down drastically. This 
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enables the processor designers to use clever techniques to implement and optimise the - 


smaller number of commands in the most efficient way possible. As a result of this 
optimisation, in general most of the instructions in a RISC processor can be 
implemented in a single clock cycle. Here we get a double advantage, because a simpler 
processor design means we can tolerate higher clock speeds. Examples of RISC 
processors include the DEC Alpha, The Intel/Motorola PowerPC, the Acorn ARM, the 
SUN SPARC and the MIPS R4000 series used by Silicon Graphics Workstations. 


Having said all this, don't fall into the trap into thinking that RISC processors are 
always going to be faster than CISC processors. An Intel Pentium processor, highly 
CISC in its architecture, will outperform a SUN RISC SPARC20 in some benchmark 


tests. RISC processors in general have much better floating-point performance than 
CISC ones. i.e. real-world numbers. Also RISC is not a new idea. A lot of the original | 


microprocessors had small numbers of instructions, largely because that was all that 
was possible. 
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The choice of processor should then be made taking into account the job its going to be 
doing. 


One of the most common ways in which modem processors are speeded up is to 
implement a 'Pineline' architecture. This is used to allow the next instructions to be 
fetched whilst others are being fetched, whilst data is being read from memory, whilst 
data is being written back to memory etc. etc. The benefits are obvious as the processor 
is not kept waiting during the time it takes to access memory. The aim is to keep the 
processor doing real work. The disadvantage is that it takes time to empty the pineline 
and time to fill it again should anything go wrong, or if the processor has to switch to 
another task. 
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Other tricks 

Another popular trick that speeds up processors is the use of an instruction cache. The 
cache is memory that can be accessed very quickly. This could be either on the chip 
itself, or could be located just next to the processor. The aim is to keep the cache full of 
the next most likely requested instructions. This is done by reading from the main 
processor memory more instructions than are actually necessary; based on the 
assumption that the chance is that the processor will request the next instruction in the 
program. Even this assumption will help speed up the rate at which instructions can be 
processed. As most programs contain conditional branches:- i.e. If X then do Y else do 
Z,.the latest micro processors try to predict where the program might jump to next and 
fill the cache according. This technique is called "branch prediction". Caches can also 
be included for data as well as for instructions. 


DSP 


DSP or Digital Signal Processors are also worth a mention at this point, mainly because 
they potentially have many applications in broadcasting. They are really a specially 
developed RISC processor. for signal processing applications. The techniques used to 
make frequently used CPU instructions go faster in a RISC architecture have been 
applied to frequently used mathematical operations used in signal processing. The sorts 
of processes we are talking about are operations like digital filtering, interpolation and | 


_ ` compression. By way of an example here is how a DSP microprocessor uses specially 


coded instructions to implement one stage of a digital filter. 


An FIR digital filter may look something like:- 
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ction from a digital filter 


х1,х2,х3 = 0 
read x(0) 


х2:=х1 
х1:=х0 


write (у) 
goto again 


x1,x2,x3 = 0 
y=0 


read x(0) 
acc:=a3 * x3 
х3:=х2 


асс:=асс + а2 * х2 
X2:=x1 


acc:=acc + a1 * x1 
x1:=x0 


асс:=асс + a0 * x0 
y:=acc 

write (y) 

goto again 


x1,x2,x3 =0 


асс:=асс + p 
p:=a2 * x2 
х2:=х1 
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acc:=acc + p 
р:=а1 * x1 
х1:=х0 


асс:=асс + р 
р:=а0 * x0 


асс:=асс + p 
y:=acc 

write (y) 
goto again 


We have now derived a single step, comprising three instructions which can be used 
over and over again. This is important if, unlike our example, there are maybe one 
hundred taps in our digital filter. The TMS320Cxx DSP does these 3 steps in one 
instruction cycle with a single LTD instruction. 


Common DSP microprocessors are the Texas TMS320(C)xx series and the AMD 
DSP2100. You can find TMS320C14s DSPs on recent Fujitsu disk drives! 


Generally speaking DSP processors are designed to be general purpose building blocks, 
Just in the same was that a Motorola 68000 could be described as a general purpose 
microprocessor. A lot of the techniques though, used in building a processor such as 


this are being used to build highly specific chip-sets for particular purposes: 


These same techniques, combining extra hardware on the silicon and specially 
customised instructions designed with only one purpose in mind are common not only 
in DSP microprocessors. They are found in video processor and MPEG compression 
chip-sets. 


Parallel Processing 


Parallel processing is another really hot topic at the moment. In essence the idea is to be 
able to speed up complex operations simply by adding more processors. This must be 
truly a hardware engineer's and marketing manager's dream. If only it were so simple. 


To make use of parallel processing, realistically you need a problem, or computational 
task that maps well onto the architecture in the first place. However in Broadcasting, 
you don't have to look very hard to find one - manipulating video images is a very 


· computationally intensive task than maps well onto parallel processing, maybe the next 


generation of DVEs will make use of parallel processor. 


I made the point, if only it were so simple. A large requirement is put on the software to 
be designed in the first place to support multi-processors. Languages like OCCAM have 
provided this support, a result of mainly British development by INMOS in building the 
Transputer. 


A notable failure in the Broadcast industry of a product which used these parallel 
processors was a graphical animation product. This product used the raw computational 
power of parallel processors to allow the graphic artist to draw and animate objects 
based on vector primitive objects. The comparison in the PC world would be like 
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comparing conventional video painting packages the Windows Paintbrush program 
with an drawing package that allows manipulation of objects like the Microsoft Draw 
program. One of the major factors in this products demise was the lack of programming 
support tools for the development environment. 


Storage 
Memory 


Generally PCs use two different kinds of Random Access Memory (RAM). The main 
memory is made of Dynamic Random Access Memory or DRAM. This memory can 
normally be accessed in 70-80ns. The trend these days is to package the memory onto 
Single In-line Memory Modules. Each memory location in a PC for example is required 
to hold 8 bits of data i.e. one byte. The memory location though is almost always given 
an extra bit to serve as a parity bit. If there is a memory error the parity bit can then 
indicate where the error is. 


You may come across older SIMMs which have 30 pins. These devices transfer one 
byte of memory (plus parity) in each memory cycle. In a 486PC, transfers have to be 
done 4 bytes at a time (32bits). In this case, the SIMMs have to installed in groups of 4. 
Modern machines use 72pin SIMMs. These allow a full 4bytes of data to be transferred 
in one memory cycle. In this case on a 486 the SIMMs can be installed separately. 


64bit processors like the Pentium, PowerPC and DEC Alpha have to access memory 8 
bytes at time. In this case the memory may have to be installed 2 of 4 SIMMs at a time. 


The faster type of RAM found in computer is referred to as SRAM or Static RAM. This 
is normally used to build the second-level cache for the processor. (the first level being 
on the processor itself.) 


Disks 


You have already had a whole evening session based on disk technology. In the current 
climate this is not unreasonable, as disk-based broadcasting is certainly the subject 
being talked about in all parts of the television and radio industry at the moment. The 
reasons for this are now well known. Disks now have the speed and capacity to make 
them a realistic storage medium for video. This is especially so when disk technology is 
married with today's compression systems. | 


Let us look at disks in the context as the main-mass-storage devices for the computers. 
Relatively speaking disk access is very much slower than from main memory. It takes 
about 100 nanoseconds to read some data from memory from RAM. If the data is on the 
disk, then it takes about 10 milliseconds to read a record from a fairly fast modern disk, 
This means the disk is about 100,000 times slower. This means that any improvement 
that can be gained in making the disk go faster is likely to make an effect on the overall 
performance of the computer. | | 


In the first generation of personal computers, the electronics to control a disk drive were 
normally to be found on a separate disk controller card. As digital electronics has 
progressed the same technology that has given us faster processors and larger memories 
has meant that the controller function can now be done by the drive itself or can be 
integrated onto the main ‘motherboard’ of the computer. The most widespread disk 
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interface in use in high-end business and indeed in broadcasting is SCSI. The Small 
Computer System interface specification was designed to cope with the needs of many 
computer peripherals. The concept is basically a protocol that allows 2 microprocessors 
to communicate and transfer data. In this case the microprocessors are the SCSI 
controller (or host) and the disk drive itself. SCSI is so useful because it removes from 
the controller to need to know anything about the physical internals of the drive, for 
example the number of platters, cylinders and sectors. The controller simply knows that 
a particular devic. luas a number of logical blocks that can be accessed. The block size 
is normally either 512, 1024, 2048 or 4096 bytes. So the controller can ask for some 
data that is stored in block number п without knowing physically how that data is 
stored. The simplicity also applies to operations like formatting. All the controller has 
to do is send the command ‘format’. This should really be, ‘format yourself. The disk 
then returns with a success or failure code when it has finished. 


SCSI has been used on the Macintosh computer ever since it was launched. It is also the 
standard to be found on most modern workstations. Beware the multitude of different 
SCSI connectors! 


In the PC these days, all of the disk drive electronics is built onto the drive itself. These 
disks are called ‘Integrated Disk Electronics' or IDE disks. The cable with connects the 
disk to the computer is really just an extension of the PC's ISA bus, so there is no 
controller as such. On many 'clone' PCs a small interface card is used, (which may have 
the circuitry for the serial and parallel ports as well,) to plug into the ISA socket and 
connect to the IDE disk drive. It is possible to connect 2 IDE drives to the same bus. In 
this case the first drive acts as the controller, and the second as the slave. Unfortunately 
most computers cannot support IDE disks bigger than 540Megabytes, although the very 
latest machines and operating systems are likely to support them. 


Since ШЕ 5 dependent on their being an ISA bus in the first place, it is highly unlikely 
that you will find an IDE in a non-ISA bus machine. 


Many of the recent advances in disk technology have not been. specially developed for 
broadcast use, in fact disk mirroring (or RAID level 1) has been around for nearly ten 
years or more. In the early days only high-end mainframe systems could afford twice as 
many disks as they needed, but disks became more affordable before they became more 
reliable and for many mini-computers the additional expense was worthwhile when the 
security of the data stored on the disks was at stake. Горіса5 own Gallery 2000 system 
support RAID 1 as far back as 1987. 


_ Т again come back to the eternal bandwidth problem. Although the principal and the 


name of RAID (Redundant Array of Inexpensive Disks) implies redundancy and 
therefore some form of protection again data loss. It appears that a number of the latest 
video-disk based servers that have been announced recently are using arrays of standard 
disks to get the required bandwidth for video, but presently have to abandon the 
redundancy or hot-swapping facilities possible with RAID, to concentrate on just being 
able to playout one of more channels of video. Strictly speaking we should be taking 
about the spread of 'AIDS' in broadcasting! 


[t appears that some companies have taken to calling RAID, Redundant Array of 
Independent Disks. I think customers must have insisted that manufacturers drop the 
Inexpensive' when they saw the price! 
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Buses 


A Bus transfers addresses and data among the various devices in the computer. In a 
modern computer there are likely to be several interconnected bus structures. The CPU 

and main memory are connected by a bus running at the highest clock speed (normally ~ 
the 2nd slowest rate after the PCs own clock). In the modern IBM-type PC this might — ; 
be called the "local bus". The mainboard will then produce one or more bus structures 

to connect to other devices. 


In the Macintosh the internals used to easy to keep track of, as right from the very early 
MACs the machines have used the same bus architecture called Nu-Bus. This is set to 
change, largely due to the PC's chequered history. 


In 1984 IBM started selling its PC AT model. The CPU, memory and I/O bus all shared 
a common 8MHz clock. This internal bus became the basis for what we now know as 
the ISA bus (or Industry Standard Architecture) bus. The ISA bus was simple and IBM 
didn't guard the technology by charging licence fees to other manufacturers. The ISA 
bus supports transfers only 2 bytes of data at a time, i.e. 16 bits. This means that with 
these 16 bits, the memory that an adapter can access is limited to 16Megabytes. 


In 1987 IBM introduced a new standard called the Microchannel (MCA) architecture. 
This was not compatible with ISA. It had clear advantages over ISA. It's clock ran a 
10MHz, and the cards could be automatically configured with a single setup program. 

On 386 and 486 PCs an MCA bus has access to the full 32-bit data path. IBM then 
tried to licence their clever ideas, and made other vendors licence the technology. MCA 
therefore cards became expensive and their popularity dwindled. It is still possible to — 
come across IBM PS/2 machines which use MCA. ) 


Because ISA had its limitations, other PC manufacturers got together and created the 
Extended-ISA or EISA standard. This was aimed at high-end machines and servers 
where performance was a problem with ISA. Like MCA, EISA was expensive, but it 
supported the full 32-bit access to the processor. The ISA and EISA standard run at 
8MHz. MCA at 10MHz. Whilst these data rates may have been adequate in 1984 and 
1987, with modern motherboards running at 33 or SOMHz, the standard was too slow. 


The first solution was the VESA Local Bus (VLB), which became popular around the 
start of '93. VESA (Video Electronic Standards Association) is a consortium of 
companies who make displays and display adapters. Desktop PCs have started to 
include one or two VLB slots to support high speed graphics adapters and maybe a fast 
network card. A few vendors have produced VESA SCSI adapters or Network cards, 
but it remains largely used by graphics cards. 


Starting in 1994, VLB, EISA and MCA are slowly being replaced by the PCI bus. PCI 
(Peripheral Component Interconnect) was proposed by Intel. It combines a high clock 
speed, with a 32 or 64 bit data path. A features which should make installation easier. 
Some vendors call this 'plug and play. No more address and interupt clashes! (In | 
theory!) \ 
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PCI started to become popular on Pentium machines, because it could provide the 64- 
bit data path (8 bytes at a time) needed by the processor and the new 64-graphics cards. 
PCI looks set to be the standard for the future. | 
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To come back to the Macintosh, although the first PowerPC based MAC was shipped 
with the old Nu-bus, by the end of the year the new generation of MACs will be using 
PCI. This will be the first time that both the MAC and PC have shared the same I/O 
architecture. 


The [BM version of the PowerPC and DEC Alpha PC are all moving to the PCI bus. 


A modern desktop PC will have a few high-speed slots. Normally ISA wiil still be 
available as well. Desktop machines are less likely to need to be upgraded, in terms of 
I/O, in future as many of the adapters that used to take up slots are already available on 
the motherboard. It is not uncommon to find a machine with on-board, graphics 
adapter, network and SCSI ! | 


No overview of a modern computer I/O would be complete these days without a look at 
networking technology. For most business applications, as well as broadcast 
applications a single isolated machine is not much use to anyone. Networks provide 
access to shared data, provide a means of controlling remote devices and provide a way 
for computers to intercommunicate. 


Networks and Communication 
Bandwidth. 


One of the most important factors in evaluating different network or types is the actual 
bandwidth the network can support. When we consider that an uncompressed 10-bit 
Rec601 video signal requires a data rate of 270Mb/s we shall see just how impressive 


an interface it is when compared to current office type networks 


Ethernet 


Perhaps the most commonly encountered network standard is Ethernet. It comes in 
different guises. The two things they share in common though, is the data rate:- 10Mb/s 
(27 times slower that serial digital video!), and the format of the data the network 
carries. The only differences between the standards are the electrical properties of the 
interconnections. The different guises of Ethernet can be described by the following:- 


-10BASE2 (1082) 


This is what is often called thin-wire Ethernet or sometimes cheapernet. The cable is 
normally about 5mm in diameter and the standard uses BNC connectors and 'T' picces 
to loop the bus through each piece of equipment. The length of cable between 'T' pieces 
must be greater than 0.5m. The 10, represents the data rate:- 10Mb/s, BASE indicates 
the transmission is Baseband, (i.e. not modulated) and the 2 indicates that the maximum 
distance for the network is 200m. Any break in the cable brings the whole network 
down. 


10BASES (10B5) 


This is often referred to as thick-wire Ethernet. The cable is normally about 10mm in 
diameter and coloured yellow. The maximum length of the cable is denoted by the 5 
and is 500m. This was commonly installed in buildings as a 'backbone' network. Where 
a single cable is routed from one end of the building to another without any breaks. 
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Connections are then made by clamping “ар-ш’ units on the cable to provide 
connections to individual machines. Taps must be at least 2.5m apart. If the cable has to 
be joined then 'N' type connectors are used. The computers are then normally connected 
using an AUI cable(I5pin D-type). If the connection between the backbone and the 
computer is broken then the rest of the network can still operate. 


10ВА5ЕТ 


Here the T stands for ‘Twisted Раш". The network data is carried by 2 twisted pair cables | 


connected using RJ45 connectors. (These can be shielded. (STP) or un-shielded (UTP) 
type cables). Two computer can be connected together using 10ВазеТ connections, if a 
twisted (Tx-Rx, Rx-Tx)cable is used, but it is more normal to use a 'Hub' which allows 
several computers to be configured in Star type configurations. Hubs can be connected 
to other Hubs. The Hubs are active devices and therefore require power. Disconnecting 
individual connections does not effect the rest of the network. 10ВА5ЕТ is a very 
popular in present computer installation because to the low cost per additional user, it is 
much less prone to total failure and its is easy to isolate one part of a large network 
from another. 


Token-Ring 


This is often seen as the poor-brother to Ethernet. It was popularised by IBM and is 
normally only encountered where IBM equinment is to be found. It comes in 4Mb/s and 


16Mb/s varieties. Although the system uses the same "Ний! principal as 10ВА5ЕТ, the ` 


hubs, or MAUs (Media Access Units) are passive devices and do not require power. 
The Hub allows the ring structure to be preserved even if a connection is broken. 


The 16Mb/s standard is therefore somewhat faster and more robust than Ethernet. Its 
popularity is hindered by the quality and price of token-ring hardware and software. 


Network Operating Systems and Protocols 


People tend to associate a network with a particular type of interface or type of cable. 
The most important factor though in getting two computers to communicate over a 
network is to know what network protocol is being used, and if the network is being 
used with a Network Operating System or NOS. This is the actual software that gives a 
network user access to directories on remote machines, allows users to 'Login' at the 
start `f a sessi, ` c^ allo "s users іс һа": resources over a network . Electronic Mail 
may well be combined into a NOS, allowing users to address send and receive mail. 
Examples of Network Operating Systems are Novell, Microsoft Windows Network, 
IBM LANmanager, DECNET and LANTASTIC. 


The Network Protocol defines what sort of low level commands the network can send 
and receive. Different protocols can exist on the same physical network. Different 
manufacturers have tended to adopt different protocols which makes life complicated. 
For instance Novell use the protocol called IPX, whilst IBM use NetBios or NetBeui. 
Probably the most independent of the protocols is the one known as TCP/IP. This is the 
protocol used to form the network of networks known as the Internet. TCP/IP is 
actually a whole suite of protocols designed so that it is easy for different computers 
and platforms to communicate. 


The same protocol can run on different types of network. 
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ISDN (Integrated Services Digital Network) 


ISDN services are what many people consider to be the immediate future for 
telecommunications. Provide relatively high-speed interconnections between users on a 
dial-up basis. In terms of bandwidth, there are 2 offerings, a basic rate of JA4Kb/s and а 
primary rate of 2Mb/s. | 19% 


FDDI (Fibre Distributed Data Interchange) 


FDDI is not a new standard. It has been around for a number of years, but only recently 
has it become affordable. The standard has 2 primary data rates, 100Mb/s and 200Mb/s. 
The Network consists of a dual concentric ring, ог a series of rings of rings. This 
topology is characterised by being very fast and robust, with the system being able to 
cope with single points of failure, and in larger systems being able to re-configure itself 
automatically. 


FDDI is obviously of interest to broadcasters as it can provide data rates that can meet 
(or very nearly approach) the demands of real-time video. A 100Mb/s co-ax cable 
version is available! 


ATM (Asynchronous Transfer Mode) 


ATM is very hot technology at the moment. Primary because it can meet the every- 
increasing demands of multi-channel video distribution. Its distinguishing factor is that 
it can identify between different types of data being carried in on the network. This is 
important for-video and audio which rely on a continuous and steady flow of data from 
the sender to the receiver, they are sensitive to when ard in what order the data arrives. 
Most data traffic is ‘bursty', with most devices on the network not needing to 
communicate all the time, but requiring fast transfer times when they do. 


ATM is the only standards based technology which has been designed from the 
beginning to accommodate the simultaneous transmission of data, voice and video. 


SDH (Synchronous Digital Hierarchy) 


SDH is worth noting for its techniques rather than dwelling on its use as a transmission 
standard. Тһе goal is to be able t^ have very-high speed data buses onto. which the 
senders and receivers can insert or extract data, 'on the fly' without having to de- 
multiplex the high-speed channel down to several slower rates so that some. 


Video Adapters 


The recent advances of computers into the television industry have as much to do with 
the video display capabilitiesof computers as they have with raw CPU power. I can't 
image a disk-based non-linear edit system having a simple text-based user interface. All 
advances in multimedia technology rely on the computer screen being able to display 
realistic real-world images. Television or video images are just one example. 


· To properly characterise displays and adapters it is necessary to take a quick look at the 


underlying technology. 
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Display adapters can be characterised by:- 


° Resolution 
Colour Depth 
° Refresh Rate 


А Bus interface 

° Acceleration 

А Resolution 
Resolution 


This refers to the number of dots on the screen. It is expressed as a pair of numbers that 
give the horizontal dots on a line and the vertical number of lines that make up the 
_ picture. Some of the common resolutions in use today аге:- 


° 640x460 (VGA) 

. 800х600 

. 1024x768 | 

° 1280x1024 
The computer monitor is really no more than a high resolution TV monitor. Computer 
Monitors are fed with RGB signals just in the way a video picture monitors are fed with 


RGB video. Each Red Green and Blue signal though, can only be varied in 255 steps. 
(output of an 8 bit DAC). If any combination of RGB in 255 steps is possible we get 


what some manufacturers call 'true-colour' capability, which is about 16 million 


different colours. 


Colour Depth (number of colours) 


This is determined by the number of bits assigned to hold colour value. 


° 4 bits = 16 colours 

e _ 8 bits = 256 colours 

. 16 bits = 32K colours 

° 24 bits = 16M colours (truecolour) 
The display adapter has to store a value for every dot on the screen. ‘The amount of 
storage needed is determined by multiplying ¿he resolution by the memory required for 
each pixel. For example the original VGA display was 640x480x16colours which 
required 256K of memory. An SVGA monitor with 512K can generally support 


800x600x256colours. If you need more colours at this resolution then you will need 
more video memory (VRAM) 


Refresh Rate 
The VGA standard originally ran at 60Hz, but some people complained that this 


produced flicker. The commonly accepted standard for computer monitors is 70Hz. | 


Typically a good 'Multi-Sync' monitor can lock to rates in the range of 60-75Hz. 
Certain combinations of high refresh rate, high resolution and full colour depth may be 
too much for the D to A converter that is driving the monitor output, hence on some 
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graphics adapters you may be warned about selecting the correct resolution for a 
particular card. 


Bus interface 


We mentioned earlier when looking at Buses and ИО that the performance of а 
particular bus is limited by its speed and width. On a graphics card in a 16-bit ISA bus 
slot running at 8MHz is going to have problems competing against a VESA bus card 
which can attain transfer rates of up to 130Mb/s! 


Acceleration 


Accelerator cards have a small CPU directly on the video card. Rather than send the 
whole dot-by-dot representation of a picture to be displayed, this CPU can be sent 
simple commands to draw a line for example, given the start and end co-ordinates, the 


line width and colour etc. This cuts down the amount of information that has to be sent 


from the main computer to the graphics adapter. It does mean that the software driver 
running in the main processor has to be very tightly-coupled with the graphics adapter. 
This is why it is necessary to have exactly the right video driver for the graphics 
accelerator you are using. 


2.1 


Software 


[ said earlier, that computers are in some ways like cars. As far as speed is concerned 
what really matters is what is under the bonnet, not how the exterior looks. To continue 
the analogy further, a car wouldn't be much use if it didn't have a driver. The driver may 
know how to drive a car but if he or she doesn't know the route, to the destination they 
still have a problem. In this case the driver is likened to the operating system of a 
computer. The operating system needs to know the various functions of the computer 
before the system can run. The knowledge about how to get the destination is like the 
application software. 


The Operating System 
Why do we need them? 


You could ask why do we need one? Why not let the car drive itself? Here are some of 
the reasons why we need operating systems. 


Resource Sharing. 


An operating system must share resources among a number of simultaneous users. A 
user may be a physical person in the case of a multi-user system, or could be a program 
or process running on a single user machine. The aim is to increase the availability of 
the computer to its users, and at the same time maximise the utilisation of resources 
such as processors, the memory, and the input/output devices. The importance of 
resource utilisation depends on the cost of the resources concerned - the decreasing cost 
of hardware has led to a decrease in the emphasis on resource utilisation, to the extent 
that in many systems there is no concept of logging the processor time used by a 
particular user. Indeed many micro computers are dedicated to a single purpose. 


Provision of a Virtual Machine 


The second major operating system function is to transform a raw piece of hardware 
into a machine which is more easily used. This may be looked on as presenting the user 
with a ‘virtual machine’, whose characteristics are different from, but more tractable 
than. those of the underlying physical hardware. Some areas which are often different 
аге:- I/O, Memory, Filing system, Protection and error handling, Program interaction 
and program control. 


There is often much misconception when comparing and naming different operating 
systems. Strictly speaking an operating system can only truly multi-task if it has access 
to more than one processor, as a single CPU can only execute one instruction at time. 
What these systems do is present the user (or users) with the appearance their different 
processes are being run at the same time. To avoid confusion it is worth further 
differentiating in order to describe an operating system as a co-operative multitasking 
system or pre-emptive multitasking system. 


Co-operative multi-tasking 
These systems rely on individual programmers ‘co-operating’ and releasing the central 


processor so that other users can continue their processes. This operation is known as 
'yielding', and is the mechanism that Microsoft Windows uses to perform multi-tasking 
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operations. i.e. Windows is a co-operative. multi-tasking system. When programmers 
issue а 'yield' command they do so in the hope that some other processes will repay the 
compliment and themselves yield in due course. This is very much a ‘leap of faith’, and 
difficult to program. People who are familiar Windows will maybe have noticed that 
most Windows installation programs do not yield; they have to complete or be aborted 
before other programs can be switched to. System 7 on Macintosh is another example 
of a co-operative multi-tasking system. 


Pre-emptive multi-tasking 


In these systems the operating system can pre-empt a process and take away the use of 
the central processor at-will if the CPU is required to do a more important task. It is the 
job then of the operating system to remember where particular program had got to and 
then restore the system to that state before a process can resume its execution. This is 
way in which true multi-tasking systems operate. Microsoft NT, OS/2, UNIX and 
VAX/VMS are all pre-emptive multi-tasking systems. 


Once you have a pre-emptive multi-tasking system, it is much easier to migrate that 
system to a multi-processor system. For example with Windows NT you can just add 
another processor card if you require more power. 


The official IEEE definition of a real time operating system is, "one that provides its 
functions and:responds to external, asynchronous events іп a predictable amount of 
time." 2 


Most real-time systems are, in fact, fully fledged pre-emptive multi-tasking systems. A 
key feature of a real-time operating system though, is the provision of 'feedback' from 
the outside (real world) to the central system. This feedback may come from a real-time 
clock, from motion or position sensors; in the сазе е ап industrial control system, or be 


`a signal derived from video line or frame synchronisation pulse. This enables, for 


example, a computer to trigger the switching of a -video signal always during the 
vertical interval. Clearly it is not necessary to have even a micro-processor to switch a 
video signal, let alone one running a real-time, multi-tasking operating system! 
Although for many more complicated broadcast systems using computer based 
hardware such as Digital Disk Recorders, DVEs and Graphics systems this is necessary, 
as there will always be functions that require some processing every line, field etc. 


Kernels 


Some confusion arises when people refer to a 'kernel'. At its simplest level a kernel is 
any piece of re-usable software on which many different applications may be built. This 
could apply to the actual operating system of a computer or, for instance, to a sales 


database kernel which is re-used every time it is customised for a particular user. 


2:2 


The Application Software 


The application software itself is program or number of programs that are executed or 
run by the operating system. Modern desktop applications may be made up of several 
programs and other components which are loaded and run as appropriate. It is worth 
looking at for a moment some of the techniques used to create the software in the first 
place. | 


3GLs and 4GLs 


A third generation language or 3GL is probably what we would recognise as being a 
typical ‘high-level’ language, having flow control constructs like IF-THEN-ELSE, DO- 
WHILE, DO-UNTIL. GOTO etc. Examples of 3GLs are languages like C, PASCAL 
and FORTRAN. These languages are often described as ‘Procedural’. The programmer 
is dealing precisely how an application should function. The order in which operations 
are specified is rigidly controlled by the programmer. 


Fourth Generation Languages 4GLs are more oriented to the results of certain 
operations, rather than the оу?" This is sometimes called being non-procedural. 4GLs 
were designed to speed up programming things that are commonly required. These may 


be functions for accessing a database. or for programming a windows-based user- ` 


interface. These are just 2 examples, there are many more. Let us take these 2 examples. 


Database Query 


- A good example of a 4GL is the Structured Query Language (SQL) used to access 


nearly all modern database systems. It allows the programmer to use much more теа! 
language when programming a particular function for a database. An SQL query might 
take the form :- | | 


"Look through this stuff. and get me something on a certain criteria; I don't care how 
you do it, just do it." 


Consider looking for an entry in an address book:- 


Select Surname, Forename, Initials 
From My Address Book 


Where Surname - 'Smith' 


The amount of code we have to write is obvious a lot less than if we had to find the 
result using a 3GL. | 


Programming а Windows user-interface 


Writing a program that displays a window with a couple of buttons on it that performs a 
simple task, can be a non-trivial programming job, which can require pages of code to 
implement. 'Visual' programming languages like Microsoft's Visual Basic (VB) do a lot 
of the hard work for a programmer in providing easily accessible ways of using basic 
windows functions like putting up a window, or a creating button which can be clicked 
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on. The result may be just a few lines of code. Being able to build prototypes quickly is 
a very powerful tool. Some special business applications are now being written wholly 
in VB as they allow companies to write specially customised programs for their own 
needs. The penalty these applications pay is that it is VB that is in control of the 
machine not the programmer, and business or speed-critical tasks may not be reliable. 


Object Orientated Programming 


The concept of Object Orientated Design (OOD) and Object Orientated Programming 
(OOP) has recently become very important in the software industry for the following 
reasons: 


Re-use of software 


Take the example of a computer system that has to control a VTR. The software has to 
be designed to understand the various operations possible on а VTR. This may be 
SHUTTLE FWD. PLAY, CUE etc. In object orientated design all these functions 
would be implemented into a 'Class' of functions. This class may well be called 'VTR'. 
Each of these functions understands what commands have to be sent to a device to 
action that particular function. 


If the system has to control a SONY BV 75 an ‘object' is defined called 'BVW75' 
which inherits the functions of the class 'VTR'. For just one device this has probably 
meant we have just had to write more software than if we had just designed something 
to control а SONY VTR. Why bother? 


If we extend our control system to now control a analogue laser disk such as a Pioneer 
VDR 1000, using conventional design we would have to start from scratch, maybe we 
would copy the code we wrote for the Sony VTR? This is renamed 'PIONEER’. This 
works fine until we want to maybe change the control protocol that is used to talk to 
each device. We now have to change the code in two places. Also, if maybe we have 
made an error in the original SONY program, we have just copied the error into our 
PIONEER program. 


Object Orientated Design, allows a new object to be created called "PIONEER" which 
inherits the properties of the class 'VTR'. These properties can now be customised to 
allow for any changes we might have to make, for instance:- the laser disk might take 
беу Al seconc «c „рїп up which th. dish is first ins. пей, whereas a tape is immediately 
available as soon as it is inserted. 


Easier to Test 
This form of design makes large systems much easier to test. In our example the code 


we inherited will already have been tested. If we have to make changes to the basic 
functions or the protocol, we now only have to make them once, in a single place. 


The User Interface 


The user-interface (or sometimes called the graphical user interface (GU*)), is often 
what makes or breaks a computer's usefulness. No matter how clever or powerful the 
machine, if a human cannot easily operate it is not much use. 
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The big advance in computer user-interfaces was in creating the WIMP (Windows, 
Icons. Menus and Pointers) style desktop. Contrary to popular belief this was actually 
conceived by Rank Xerox not by Apple. The great step forward was to make everything 
on the screen a ‘bitmap’. or a pixel by pixel representation. This enabled sections of the 
screen as a whole to moved around simply by copying blocks of memory. 


This new way of working takes some getting used to. [ expect most of you have now 
used a mouse on a computer at some stage. Do you remember first getting used tc нь. 
double-clicking or dragging and dropping windows? This has been the same argument / 
often used over the last few years against using computers іп the control room and 
operational areas of TV stations. For a long time keyboard and mice were frowned upon 

as being incapable of being used by control room operator. "What they need is big 
chunky buttons, that still work when they've had coffee poured down them!" quoted one 
Engineer. 


Things seem to have started to change over the last couple of years though. Maybe this 
is because the senior Engineers themselves have become used to keyboards and mice 
when using them in an office environment. 


There is still a requirement though on the part of the designer to design the User 
Interface properly. This may take into account the fact that the users are not regular 
users of a Windows type system and therefore make it impossible for users to move or 
hide vital part of the screen. A good example of this is the Procion's desktop 
"Workbench'. 
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Session 2 


Broadcast Application of Computers 


The History of Computing in Broadcasting 


I thought it would useful to look at just where computers first got, "their foot in the 
door in broadcasting". As quite a lot of this is before my time, I asked a colleague who 
had worked for the BBC to recall some of his earliest memories:- 


1960s 


BBC buys its first computer. An Elliot 803, 4KB memory, uses Magnetic Tape with 
sprocket-holes. 


1970 Election uses cardboard graphics to display the election results. 


1970s 


Second generation colour camera makes use of solid state logic for auto-registration. 
(not programmable devices) - blacked by the broadcast unions. 


NHK in Japan fully automate programme presentation using IBM mainframes. 


BBC Develop NICAM 3 which was used to multiplex 6 audio channels together for 
radio distribution. 


1974 Election uses Electronic character generator (ANCHOR) but this is built with 
analogue circuits and is not а 'сотршег. Cardboard still used in case of failure. 


8-bits computers used to develop Teletext, moving to 16-bits for production use. 16KB 
of memory, 2.5Mb disk drives. First actual computers introduced into technical areas - 
consequently blacked by the broadcast unions. 


BBC World Service introduces electronic newsroom. No word processing, Word Wrap 
is hard coded into the VDU which has enough memory to allow scrolling. 


1979 Election uses computer generated graphic. Mechanical Swingometer replaced. | 


Чооіса 


3.3 


3.4 


4.1 


4.2 


20 
198065 


ВВС Engineer designs the BBC Micro for ап computer literacy program. Estimated that 
5.000 would be required. Later sold in six figure numbers. Built by Acorn. 8-bit Max. 
64Kb memory. 


Quantel appear on the scene. Micros now used in all kinds of devices:- framestores and d 
synchronisers etc. / 


1983 Election uses Quantel Paintbox under computer control. 
Channel 4 launches. making the most extensive use of computers in scheduling so far. 


BBC moves into computer newsrooms with three different systems for Breakfast TV. 
Radio and TV. 


C 


1987. BBC replaces sticky clouds with computer graphics and telecoms. link to Met 
Office Cray. Michael Fish wrongly forecasts the hurricane. (Met office later use this to 
justify buying a second Cray!) 


1990s 


1992. BBC buy their first AVID Newscutter. Later used directly to air. 


Present Day Computer-based Systems 
Newsroom Systems 


Some of the largest 'conventional' computers in television are often to be found in the 
Newsroom. Newsroom systems became popular as they allowed most of the operations 
performed in the Newsroom to be concentrated in one system. These operations C: 
include, archiving, word-processing, and editing operations, to time-keeping and | 


. machine control of VTRs CART machines and Autocues. 


Many newsrooms would not be able function using their old ' paper based methods. 
(Although journalists will probably tell you otherwise). 


The BBC alone has over 3000 terminals connected to its BASYS newsroom system 


Transmission Automation 


Automation has received a lot of attention mainly because of its failures rather than its 
successes. Saying that, many TV stations now rely on large amounts of automation, and | 
could not operate any other way. Automation systems have proved that they are reliable ú 
and useful broadcasting systems. 
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There is a belief amongst some sceptics who claim you cannot replace several dozen 
trained people in one go with a complex computer system and get it right first time; 
because you can never fully specify what the original people did, or would do, in every 
possible circumstance. I don't believe this is wholly true. but many automation systems 
have tried to become over-complicated, and have tried to mimick the way tn which the 
jobs used to be done by people. 


One of the benefits of a computer control system is supposed to be that the functionality 
is easy to change. because it only requires a change in the software. This only holds true 
if the modified system can be tested in such a way so that any unwanted effects (bugs) 
can be identified. In many cases this is very difficult to do, as developers seldom have 
access to a complete test system that is identical to the target system. There is a lot of 
debate about the safety of the computer control system for the Sizewell B nuclear power 
station. Strictly speaking if any change is made to a system, everything needs to be 
tested from scratch, as a simple change may have a profound effect on the software as г 
whole. In the case of the nuclear power station, or other systems which are classed as 
‘safety critical’ this testing could even stretch to mathematically proving that the 
underlying algorithms and logic in the system are not flawed. 


The KISS rule should always apply. (Keep It Simple - Stupid!) 


Non-Linear Editing 


Non-linear editing and disk-based playout systems are noteworthy examples of how 
mainstream computer technology is directly effecting broadcasting. 


Quantel had already been able to graphically manipulate uncompressed video with 
systems like 'Harry' which was capable of on-line editing with the greatest of ease. A 
Harry system could not though be described as mainstream computer technology; it 
relied on a lot of custom hardware. The recent advances began with off-line edit 
systems, from Avid and Lightworks. Avid's system was based on MAC technology and 
Lightwork's was based on the PC. | 


The computers were used to store compressed versions of the original source material. 
These images could then be graphically manipulated on-screen, and sequences edited 
together without destroying the compressed copies of the pictures, or the original source 
material, still on the shelf. " 


The systems are said to be non-linear because the disks have random access to all the 
material; it is not necessary to shuttle from one end of a linear tape to another. 


Once an edit was finished a EDL or Edit Decision List could be created, and this was 
used to control an on-line conventionai edit suite, which used the original source tapes 
and VTRs. | 


The next significant step came when the compression and decompression technology 
had progressed to the stage where the video could be decompressed in real-time and 
used directly on-air. Off-line became On-line overnight with a simple upgrade of a 
compression board in the MAC. Many people have been scathing about the quality of 
such decompressed pictures, (even companies like the BBC now use the technology 
direct to air for news) one thing is sure the picture quality is set to get better. 
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The next big hurdle to be overcome is to provide usable high-bandwidth networking 
and central file server systems so that once a piece of video is in the central system it 
need never leave until it is ready to be played to air. AVID have said they are using 
ATM and Fibre technology to produce their AVIDNet system, in conjunction with a 
Silicon Graphics Challenge server. Quantel too | believe are working with Silicon 
Graphics. 


Broadcasting has always been an industry which required ‘special hardware to dc 
specific jobs. AVID, perhaps the leading in supplying this technology would have you 
believe that the days of specialist hardware and Tape-based technology are over, and 
evervone will be using their technology before the end of the century. 


Conclusions 


Future Directions 


We have looked at present-day and past computer technology. Quite often technology 
that has been used in today's computer systems finds its way into tomorrow's broadcast 
systems. RAID is such an example. 


We have already covered a lot of ground in a short space of time, and it would be 
difficult to look at all aspects of computing to give a complete picture of where we 
might find computers in the industry towards the end of the century. If we look though, 
at what the next generation of processors will be able to do, we might get an idea of 
what computers will doing in the future. 


Recent Processors Developments 


This year's latest developments in micro-processors have recently been unveiled at the 
Microprocessor Forum in October. Some of the promised new devices are already 
available in sample form and are going push standard computing hardware even further 
into the broadcast industry. Here are some of the ones to watch for:- 


SUN UltraSPARC. 


This is the first general purpose processor to include instructions that support 
multimedia type operations. It is internally designed so that it can process 8 pixels at 
. once (for MPEG decompression for instance). This processor is set to re-write the 
specifications for video performance on workstations. 


MIPS TS 


MIPS, the people who make the processors in Silicon Graphics boxes. Have announced 
their next generation processor the Т5. T is for Terminator. MIPS are also working on 
low-power versions of their R4000 series devices currently used in SGI boxes. These 
will probably be the next generation of Game-Boy. 


DEC Alpha 21164 


Regarded now as the world's fastest microprocessor. The first to achieve over 1 billion 
instructions per second (1.2BIPS). It will run at 266 or 300MHz. In bench mark terms 
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it achieves 330SPECInt92 and SOOSPECfp92, which is three times in the integer 
performance of a Pentium @ 100MHz, 66 percent more than a MIPS 8000/8010 which 
15 designed specifically for floating point work. 


IBM/Motorola PowerPC 620 


А new top-of-the range device from the people who have sold more RISC chips than 
anyone else. 


Convergence 


There is lots of talk in the computer industry about ‘Convergence’. This is nothing to do 
with the how well their computer monitors are lined up ! the convergence is between 
computing telecommunications. and broadcasting. We already have examples of how 
broadcast companies are getting into mainstream telecoms. cf. NTL. And conversely 
telecoms and computer companies are getting into television. The Convergence is not 
Just the various businesses converging, but the technology too. 


As disk-based broadcasting is taking off, the companies best placed to take advantage 
of this are the companies who already have the correct skills, but who understand 
broadcasting technology. Companies like HP, SUN and SGI are, in my opinion, torn 
between following the multitude of commercial video-on-demand projects and getting 
into Broadcast applications properly. As a result the hardware is there, but the hardware 
vendors are not developing specific broadcast applications themselves; this is left to 
small third-party suppliers. 


The secret of the success of the next generation of computer-based broadcast systems 
will rely on the quality of the software controlling it. As we have seen in terms of the 
hardware technology a lot of it is already available. We have the processors, storage, 
busses and networks to make it happen. The hardware needs to be exploited by well 


= specified, well designed and robust software control systems if we are to reap the 


benefits. 


[ап Cockett 
November 1994 


Broadcast and Multimedia Group 


Architecture - Block Diagrams 


Хі 
CLKOUT X2.CLKIN 


WE 
DEN с 
MEN = 
зо o 
BIO = 
as. z 
MC MP ° 
с 
INT à | PROGRAM 
RS ш | ROM EPROM 
а (1.5K/4K 
< WORDS) 


16 


vp 
р ~ 
N - 
Y» 
Po 
o 


015.00 


PROGRAM BUS 


ЖЕ й 
16 


ARO (161 | 
Еа AR1 (16) | 
SHIFTER 


8 (0-16) 


(161 


MULTIPLIER 


DATA RAM 
(144/256 
LEGENO WORDS) 
ACC= Accumulator 


ARP - Auxiliary register painter 


ARO - Aunxthary register G ACC (32) 


АЯ! - Auxiliary register 1 
ОР - Data page pointer 
РС - Program counter ТЕ 
Р - P register 
T 1 
шооны | SHIFTER (0. 1. 4) 


4 16 
DATA BUS 


Figure 3-1. ТМ532010/С10/С15/Е15 Block Diagram 


3-5 


= 


Yun 
(оно) шејбоја 


Ja}sibay 
eeg 
jun 


ОЛ | (11V бш55езоја ее 


Пао чоцелэцао 111 


à i т m ТЭ 


508 
vivo 


їй à Рада 
т“. 


| 
шезбен у 42018 050899 71-, 92п6:3 


SIVNIIS 
1081М02 518 


83710818023 
ГЕ пабы ` 5 
БЕ 833308 


9NION3d НЭ133384 SNION3d шым 


8311081N03 508 
553800ү 


= 
883151938 
8534007 п | 
WOULINHASNI 


VIX3WLINN 
ANJWNONIVSN 


50У4 
viv0 


ВХ Tidunw 
3715 


8011235 
viva 


М011235 
SS3¥00V 


D 


121901 sma 


553800У 
WJISAHd 


vivo 
ЛҮМ8ЗІМІ 


1081802 ому иззизпозбоноян 


жер SOVd 
SS3H00V 


= 


sng 
SS3H00V 


МС68030 USER’S MANUAL 


MOTOROLA 


1-2 


ро бо 


Эрсд шпцига =xunnuəd 98708 98508 98208 9808 
7661 C661 266! 6861 9861 2861 8161 
НЕХ UU C ыа 0 


wzy —обб ср; 00096» 0006 


029 Зајемоа 


WLG ж гуна 920 


L 

|09 DdJemod 
; с 

NE € 

£ 
WL УП syan r 
SL SAIN S 
INO 9 x 9 
INO 2 ç 1 
8 
6 
0 


Na 
е єр 
~ 
е; 2 
| 
— 


31 А8 - Ə9!noS 


(стеле) ай! 
(сол) 97213 ZHW99ZHWOO!L 
(голе) 97219 ТНИ08 — 109 офемоа Lt PPON 000954 ИН! 
#4992 ZHWSZL — OS LL 9SM-Vd SZL/SEL 0006 dH 
9UON ZHWOOE 7922ж 
(гюлә) 93952 ZHNCC/ZHN99 |. гхадвр 
(ешералешезщ) 


€61NEBdS ` 3HOVO IVNHLDG WOOP ЗО5532О0Ч4 


ОМЕШЦЭЦӨЯ 105590014 


VSI 


Эс T 


шәләшр је unu 


sng 182907 10SS99014 


Way} био) 
'snq IOd шод 
sng 1ossəooiud 

$ә}ебәлбә$ 


эбриа 124 


SOIHdVYD 


Эбри 
sng |8201 усал pug 


усал 


ж” snq 
JOSSƏOO1d Əy} цім SNOUOJUOUAS si pue 
ѕәјеб мај e Jo sjsisuoo әбрид sng-TA 


VSI 


sng 12907 108882014 


Lecture 9 


21 NOVEMBER 1994 | 


AUDIO 


Robin Caine (Pro-Bel) 


qoe 


~) 


Ж 


I- 


1.1 


1.2 


1.3 


1.4 


RTS Training - Audio 
Digital Audio Fundamentals 


Sampling 
The sampled sinewave has many double-sideband components - filtering restores the original 


completely, regardless of jitter provided that the Nyquist criterion is met. - This is NOT 
quantisation noise. | 


Quantising 
Quantisation noise is the difference between the sampled levels and the nearest digital number. 


The number of bits determines the QN. Measured noise level depends on QN, modulation depth. 
and analogue noise added. | 


Modulation Depth | 

An ADC has a maximum input level corresponding to the largest digital number, or "maximum 
code". This maximum level is called "OdB FS". The EBU specifies a maximum code of OdB FS 
when the analogue input is +18.06dBu. The SMPTE specifies +20 dBu. However, US equipment 
is often set up so that line-up is at +4dBu to start with. Thus, some VTRs with ADCs and DACs 
built in reach 0 dB FS at +24dBu in. 


NOTE: Modulation depth is not a matter for personal choice any more than line-level in a system. 
If a broadcast centre is to interface its signals effectively, the depth must be standardised. 


Time and Amplitude (Jitter) 

Timing inaccuracies at the ADC or DAC introduce noise, because the voltage at the sampling 
instant is changing. At 20 kHz and 0dB FS a few tens of picoseconds is the maximum to get 
20-bit accuracy. 


NOTE: This is a separate matter from the jitter requirements of bitstreams. The quality of the 
DAC depends on the jitter rejection of the phase-locked-loop or other timing recovery circuit. 


It is difficult and expensive to get from 18 bit to 20 bit theoretical noise because of the clock 
purity required. Modern ADC chips code to 20 bit accuracy in quantisation noise terms, but this 
noise, which can sound unpleasant, is masked by the clock jitter noise. This is inherently of a 
random analogue nature and effectively dithers the signal, so that the ADC really is 20 bits 'good' 
but has 18 bits, or so, level of measured noise. 


The timing associated with the digital signal is an analogue component, which must be 


- transmitted along with the digital numbers ог possibly derived another way. However, noise is 


dependent on bandwidth, so the key to reducing the clock jitter in the DAC is to use a recovery 
circuit of the lowest practical bandwidth. 


Instead of recovering the clock from the transmitted signal, it is possible to run the DAC from the 
same master clock source as the ADCs, if the system is synchronous. Then the DAC performance 
does not depend on its recovery characteristics. 


Similarly, CD players run the DAC from a clean clock, and control the disc speed to provide the 
data at the time required. 
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Filters | 
First generation ADCs sampled at 48kHz and needed sharp analogue filters with good rejection at 
24kHz. These had to be phase-equalised to prevent ringing otherwise some signals, e.g. square 
waves, caused clipping before the expected overload level. Filter design also meant that they could 
not handle the full level close to the band edge anyway. These analogue filters had delays of four 
to five hundred microseconds. The rejection of the filter above 20kHz determines the level of alias 
components. 

| d 
Oversampling | ) 
Modern ADCs all use oversampling, which "Achieve increased resolution not by decreasing the 
error between the analogue input and digital output but by making the error occur more often." 
(Robert Adams, 1986). The technique involves low resolution, high frequency sampling with 
feedback shaped to push the noise above 20kHz. Followed by a digital filter, converters effectively 
free from the old problems of linearity and monotonicity are easily reproducible. 


Transversal digital filters have sharp cut-off without phase distortion but still have delays in the 
hundreds of microseconds. The higher the sample frequency, the shorter the delay (an advantage 
of 48kHz over 44.1kHz). Even these can overload due to ripples in response, and a small loss in 
level is sometimes incorporated. 


AES3 Digital Audio Interface 


A Subframe contains one sample, left OR right (A or B) 
NOTE: 'Validity' bit is ZERO for VALID. 


A Frame is a pair of subframes occupying one sample period, 20.8us at 48kHz. The left, or 'A’ 
sample is sent first but the two are sampled at the same instant. 


`) 
A Block is 192 frames, containing 192 bits, or 24 bytes, of channel status. User bits may v. „пау 
not use the block start for synchronisation. There are two channel status and two user channels in 
an AES3 bitstream - one with each audio channel. Readers usually only look at one as they are 
usually the same. 


Preambles and Biphase-Mark 

Biphase-mark was chosen firstly because it works the same with either polarity. The parity bit, by 
making total no. of ‘ones to an even no., always sets the preamble starting on the same edge. X 
and Y preambles come before the left or 'А', and right or 'B' channel respectively. 


NOTE: There is no block start for the B channel - it relies on the X preamble being replaced by a 
Z in the X position 


Channel Status 
Channel Status carries information to define the use of the signal e.g. stereo or mono, flat or 
pre-emphasised, sample rate, source and destination labels etc. In practice, only a few of the 
specified facilities are in common use. The Professional/consumer bit, the first bit, is used by 
VTRs with the last byte, a CRCC, to verify correct AES3 input. Pre-emphasis and sample rate are 
detected by DAC filter chips to correct automatically. A few users have made use of source labels 
and the sample address code. ) 
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Problems usually occur because the AES allows three levels of implementation. The 'Minimum' is 
that the 'Professional' bit is set and nothing else at all, allowing the simplest possible designs. 
VTRs usually check the CRCC in byte 23, and if it is not there they flag an error. 


The second level is called 'Standard' and requires the CRCC and up to all the flags allowed in the 
first three bytes. One design of mixing desk will not accept 'pre-emphasis not indicated', which is a 
legal flag, and require a specified pre-emphasis type. 


'Enhanced' channel status includes any combination as long as the CRCC is present. 


Consumer channel status starts with a zero bit and has no CRCC. It indicates source type, e.g. CD 
or RDAT, and has a copy prohibit flag. 


User Bits 

The User bits are for the user to specify. Any pattern of bits is legal by definition, and no format 
need be followed. However, there is a flag in channel status to indicate certain options for user 
bits. These are that the format is based on the block structure of 24 bytes, or is unformatted, or is 
the bitstream pattern defined in AES18, a variant of the HDLC X25 protocol, so arranged that 
timecode can be indicated precisely and other data carried in a manner independent of bitrate. 


Auxiliary bits 

These are the four least significant bits of the audio sample field, below 20 bits. Sometimes used 
for longer audio words, where the result of a fader multiplication is preserved without rounding 
pending further processing, these bits can also be used as preferred by the user for such things as 
low-bit-rate talkback, or for carrying gain information as required in digital audio broadcasting. 


+ Usually they are set to zero. 


Signals on Cables 


The second advantage of biphase-mark is that the energy is concentrated at the two frequencies of 
1.5MHz for zeroes and 3MHz for ones. Thus a bandwidth of 100kHz to 6MHz is sufficient to 
preserve the waveform. 


AES3 specifies 110 Ohms for transmitters, receivers and cables. The 1992 version requires 
balancing resistors to equalise the currents and these reduce radiation by about 20dB. The 
addition of a filtering capacitor to remove components above the required 6MHz also helps 
considerably. 


The general rule is that any discontinuities in a cable run such as jackfields must be within 3 or 4 
metres of either end. Above all, no mis-terminations must occur at distances of around 15m 
because this is the length at which the ones are cancelled by reflection and the zeroes are not. 
XLRs, 'B' gauge jacks, multipole connectors such as D-type and Varicon and others are 
acceptable. 


The slides show a number of waveforms of real AES3 signals. To have sufficient eyeheight a 
signal of at least 1 Volt p-p will be necessary. This depends on the drive level as well, because the 
p-p signal will be determined by the lowest frequency component, which suffers less cable loss, 
while the higher frequency will suffer a greater relative loss. Thus a high drive level into a longer 
or lossier cable will have a smaller eye than a low drive level into a short cable giving the same p-p 
voltage. | 


3.2 


3.3 


Cable of the wrong impedance is in any case useless. The slide of 2m of 'star-quad' shows how 
quickly the waveform becomes distorted. 


Jitter 

The bitstream should have edges corresponding to a 6MHz clock, but will generally have some 
variation from this called ‘jitter. Where the deviation occurs at a low frequency, below above 
50КН2, it may be due to data coming from a tape transport erratically, or perhaps due to the 
response of a phase-locked-loop. When a number of devices are locked to one another ^^ 
cascade the system can exhibit jitter gain, where loops all have the same slight lift in respc / 
(This is a good reason to lock everything to a central reference) High frequency jitter can occur 
when the received waveform is distorted by cable loss and the receiver slices the waveform 
differently for zeroes than for ones (data jitter). However, this jitter is not inherent in the 
waveform - it is induced by the receiver and generally one receiver will be different from another. 
For this reason, a measuring device cannot give an accurate jitter figure because it will be different 
when the normal receiver replaces the measuring device. The effect of jitter can be to induce 
noise into the DAC, more serious when the jitter has a single frequency. (A high quality DAC 
phase locks its clock to measure edges, free of data jitter). However, the effects are smaller than 
predicted and not usually noticeable. If the jitter causes bit errors then it will be outside the 
specified 20ns limit. 


AES3 on Coax 

In North America especially, there is a preference to convert AES3 to 75 Ohm coax. with BNC 
connectors, and to handle the signal as video. Virtually no digital audio equipment has this 
interface supplied and adapters using resistive pads or transformers are used. Because the send 
level is reduced to 1 Volt p-p and the receiver matching may be lossy, the total attenuation 
permitted is much less than for the standard balanced case. It is the size of the cable rather than 
the unbalanced nature which allows greater distances compared with balanced. 

The slides show much neater waveforms from resistive adapters than transformer adi 7, 
However, if the equipment does not have transformers built in, such as SPDIF receivers, then the 
transformer will cure common mode problems. 


Other interfaces exist - 

SPDIF is very similar to AES3 but uses 75 Ohm screened cable and 'phono' connectors for short 
distances up to about 20m. It will carry much further on good 75 Ohm coax. The other difference 
is that it is usually associated with "Consumer" channel status which defines sources as CD 
players, RDATS etc. and carries a copy prohibit flag. | 


SDIF-2, Prodigi and Yamaha. SDIF-2 is quite different using separate 75 Ohm coax cables with 
BNC connectors for left, right and wordclock. It may appear also on a D-type connector on 
multitrack machines, and is similar to, but not interchangeable with Mitsubishi ProDigi interface. 
Yamaha uses DIN connectors and balanced signals in series, with separate wordclock on a BNC. 


MADI is a multiplex of 28 synchronous AES3 signals carried on an asynchronous digital bearer at 
125Mbit/s. This allows speeding up to full 54kHz varispeed or expansion to 32 stereo. Although 
that is outside the spec. the first 28 will still connect to a strict MADI receiver. All the electrical 
and optical features are as "FDDI", the coax being rated to 50m and optical much further. More 
on MADI later. | ) 
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Timing and delays 


Lipsync problems are of the order of tens of milliseconds. The use of TV frame delays in various 
processes causes audio and video to get out of step. Coding audio and / or video to MPEG 
standards also leads to uncontrolled variations in delay. New work is going on to add 
'Clapperboard' flags and delay indicators to bitstreams to allow correction at the end of the 
system. 


Lesser delays, up to 1ms, are a problem when mixing related signals where severe frequency 
response effects can occur. Mixing clean atmosphere with a commentators mic. is an obvious 
example. An ADC and DAC together give about 1ms of delay. 


Smaller delays still, of the order of a few samples, don't matter EXCEPT when variations occur 
causing the number of samples to change abruptly. This produces clicks or bitstream disruptions 
often with extended effects downstream. While not disastrous, clicks of the order of -45dB are 
NOT acceptable when they are consistently induced by the system design. System components 
are required to control these effects such as reframers and delay units designed in conjunction with 
the digital audio timing reference standard AES11. 


AES11 Synchronisation 

It is not essential or fundamental, but generally accepted that a digital audio system should be 
isochronous as the easiest and most satisfactory approach. All that's needed to lock everything 
together is a single frequency, e.g. a square wave at sample rate. AES11 has the same format as 
the digital audio, so that equipment can be locked to any incoming signal, but with tighter 
tolerances. . 


The arrow on the right indicates the phase of the reference signal. Imagine that an incoming signal 
is at very slightly the wrong frequency. Its phase angle relative to the reference will be rotating 
slowly round the circle. At some point on the circle the system will slip a sample, dropping one if 
the input is fast or repeating if it is slow, at which point the number of samples within the receiver 
will change abruptly. By constraining all the outputs to have their phase within the range of +-5% 
or 18 deg. of the reference it is possible to demand that all receivers slip samples on the far side of 
the circle, by stating in the standard that they must accept signals within +-25% or 90 deg. without 
change of delay. 


Video phasing 

In practice the AES11 audio reference is not used everywhere. In a video plant a central SPG 
feeds colour black to all the video equipment. АЕ511 may be fed from a local video to АЕ811 
converter or in a few cases from a ‘wordclock’ converter rather than a distributed reference. 
Unfortunately, devices locked to video cannot meet the phase requirement above, so it can happen 
that the audio from a DVTR will have a phase angle on the forbidden side of the phase circle. If 
this corresponds with the sample slip point then the slightest jitter will cause alternate repeats and . 
drops, which is known to have the effect on a recording DVTR of appearing asynchronous, 
causing it to stop. 


To solve this problem, the EBU proposed an arbitrary phase relationship between colour black 
and AES11, that preamble X (or Z) aligns with the leading edge of line 1. Now it is possible for 
the audio output of the DVTR to be guaranteed to be on the right side of the circle. This solution 
is quite satisfactory for 625/50 systems, where there are 1920 samples per video frame, but in 525 
there is only an integral no. of samples in every 5 frames. Thus although all the colour black 
signals are in phase, the AES11 and indeed any device which generates audio from video reference 
will have only a 1 in 5 chance of locking up in the same phase as any other. 
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Hysteresis 


СА small hysteresis is inherent in the receiver design. Ideally it should approach the full 180 deg.. It 


could be made to be more than 360 deg. - then the phasing problem would go away, but there 
would still be uncertainty about the number of samples. 


Effect of sample delays 

Why does it matter? This is an example of a typical presentation/playout system. In each of these 
mixer functions there are three samples of delay. Cross-fading mixer 2, from mixer 1 output t^ ^e 
matrix output, will cause a frequency response effect, but much worse is the effect caused b, е 
automation system when the control system reassigns the matrix and does a fast reverse crossfade 
between two sources of the same signal. Then a fade which should be inaudible gives a very 
distinct disturbance. A delay module in the path will correct this, but it relies on the delays 
remaining constant. Similarly, the presentation operator may wish to go to rehearsal after a 
programme junction and re-route the playout machine via a bypass path. After the playout of the 
next programme has started he will cut to the bypass path, which must be absolutely clean, 
otherwise distinct clicks will occur in the title music of every programme! 


Routing 


Transparency 

Digital Routers are NOT Ше a piece of wire’. The specifications of digital signals demand that all 
signals are launched meeting tight tolerances and must be recoverable after considerable 
distortion. If the router were a piece of wire then multiple passes would lead to uncontrolled path 
lengths and degradation with consequent errors. 


Reclocking 

At the very least the signal must be re-sliced and raised in level to the required output amplitude. 
Much better is to reclock the signal, that is to recover the highest frequency from the bitst"^am 
and regenerate each pulse, delaying it by only half a clock cycle - say 160ns for AES3. A Jle 
crosspoint router can do this and still be able to pass asynchronous or different sample rates. | 


Reframing 

Reframing is the process by which all outputs are re-timed to the AES11 reference signal, delaying 
each input by up to one sample to realign them. The slide shows the effect on a bitstream of a 
simple crosspoint switch between sources A and B. The raw switched output has a serious failing 
in that there is an AES frame of the wrong length. This not only causes an error in a following 
receiver but may have quite indeterminate effects. It may disturb phase-locked loops, extending 
the error from one sample to several milliseconds or cause a monitoring system to log and print 
errors every time the switch changes. Typically, it will mute a DAC for a period of several 


milliseconds. 


Reframing contains the problem by patching up the bitstream, using the AES11 reference timing 
information, so that while the output still contains a faulty sample the clock and framing is 
restored. Thus no downstream effects are perpetrated. This process can be built into a crosspoint 
router but is expensive and may only be needed on outputs which may be heard switching 'on air'. 
Assignment routes do not need it so outboard reframers on critical outputs are more cost 
effective. Clearly, the output of the reframer will restore any bitstream not compliant with AES11 
timing to correct phase, and these devices go along way to clear up the phase problem on e**-ting 
DVTR outputs. | 


~ ia 
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NOTE: that no attempt must be made to correct ће AES block length. It was established in the 
standard at an early stage that the delays involved in block phasing are too long and that blocks 
are only relevant to channel status and user bits. The audio must never be compromised by block 
considerations. | 


A crosspoint router is well suited to assignment switching in an asynchronous environment. 


TDM 

In a synchronous or isochronous environment time division multiplex techniques are appropriate. 
In this architecture all inputs, whether analogue or digital, AES or other format are converted to a 
digital bus directly compatible with the AES3 bit structure. Input cards can be ADCs or AES3 
receivers, or 2048kbit/s telecom format modules. Each source occupies one timeslot in each 
sample period, up to a total of several hundred, A 2-page RAM under processor control re-orders 
the timeslots in each sample period to feed a similar range of output formats. Thus any source can 
be routed to any output regardless of the signal format. A key feature of this architecture however 
is that each AES3 will be an error-free reframed output because the AES frames read from the 
RAM must constitute a coherent bitstream, with samples from the first source followed neatly by 
those from the following source at any switch. Where an ‘on-air’ switch occurs between two 
sources of the same signal, it is impossible to tell that a switch has taken place, provided that the 
delays have been equalised. 


MADI 

This TDM has its forte in format conversion, while being a central routing switcher. MADI has 
the same advantages in terms of clean switching, plus the advantage of compact distributed 
routing, in which the 'fast' bus of TDM is replaced by a fibre no longer restricted to a few bays in 
size. 


MADI was of course contrived for connecting audio desks to multitrack recorders. It also made a 
useful standard for the purpose of connecting stage microphones to remote vehicles without 
earthing problems, but in broacasting it has found a real advantage in an AES3 compatible form of 
router. A full MADI mux and demux fits in а 30 frame with dual power and fibre converters. 


‘Connected back-to-back this provides a compact 28 square stereo router by means of a control 


card which re-shuffles the MADI timeslots between input and output. The cost of this form of 
router is similar to a simple crosspoint type. 


Much more than that, however, one of these frames can serve one or two studios with 28 stereo 
inputs and outputs, and all the signals can be carried to a central switch on two fibres. Eight such 
pairs add up to a 224 square stereo router occupying only 6U for the router plus 6U for fibre 
conversion and distribution. With outboard ADCs and DACs only used as necessary this is the 
ideal isochronous router for television sound. 


It is important to realise that these techniques are NOT packet switches, each sample having a 
predetermined timeslot and preserving its original timing free of justification and buffer jitter. The 
disadvantage of using packet switching of time-related data such as audio and video lies in a loss 
of the original timing information, with consequent large increases in delay due to large buffers 
needed to handle the packet queuing processes. 
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Embedded Audio 

Why bother with routing audio at all when you can embed it in the video and forget all about it? 

The main output from the studio or any long runs may well be embedded with every advantage, 

but in post production, audio will still need mixing and processing in a different manner from the 

video. The choice will be dominated by the relative cost of embedding audio and extracting it for 

each operation versus the cost of the AES3 level of router to support separate audio. With a 28 x 

28 stereo router (MADI above) with clean switching and precise timing and delays costing around 

£10k, the number of emdedders and extractors at £1k or so each shows that this is a well-balar” N 
equation. It may well be that technical features will determine the choice after all. EY 


Latency-delay 

A crosspoint router has a delay under lus, and a MADI or TDM about four samples. Embedding 
and extracting AES3 from digital video has delay of 64 samples or 1.28ms, enough to be well into 
the second group of delay problems, where one such may be acceptable but several will be add up 
to an intolerable delay. 


The problem with embedding is that while the standard is quite clear about the number of samples 
of latency in the embedding and extraction process, no statement is made about what part of this 
is in the embedder and what part is in the extractor. Thus if two different makes of equipment are 
involved which will undoubtedly be the norm there will be several samples of uncertainty in some 
paths although they may be consistent day to day. 


Processing 


Processing audio in the digital domain is almost always done by a DSP. This subject is broad 
enough for several textbooks, so here it is limited to a few thoughts on mixing and fading. Ideally, 
a fade would change the value of each sample by a different amount, as the fader moves. “Ч 
practice, messages to the DSP take too long, and the fade value is adjusted periodically. ‹ — ) 
every millisecond is adequate, but if this is extended ап effect called 'zipper' noise is caused. Fader | 
coefficients need adequate binary accuracy as well and about 12 bits is enough from the fader 
ADC if the processor interpolates values every ms or less. The output from an ADC usually has 
enough random analogue noise to mask the effects of quantisation noise, but if this is faded down 
by 10dB or so, then the random noise is lost and quantisation noise will be heard, especially if the 
signal is subsequently faded up again. A fader or mixer must therefore reinsert random noise, or 
'dither'. This can be from a pseudo-random generator, but the addition of two such is better. 


Compressed Audio 


Compressed data audio started with the A-law used for telephony. This is called 'Instantaneous 
because each sample is recoded from a look-up table to reduce its bit-length from 12 to 8 bits. 
The code comprises a sign, a range and a value, or if you prefer, a sign a characteristic and a 
mantissa. There is no delay associated with this process. 


NICAM is 'near-instantaneous'. This derives from the process of sending the ten most significant 
bits excluding sign extension. There would be no advantage in this because the range code would 
take as many bits again to transmit. Instead, 32 samples are stored and the range required fe” “е 
highest value sample is used for all 32 samples, thus reducing some samples to less than 16 | 1 
There is an inherent delay of 1ms in this process, and the original specs. for NICAM for called for 
the range code to be transmitted immediately they were determined to avoid storing the samples in 


the receiver. In practice error-correction adds more delay than the NICAM process. These 


techniques were always distinguished from irreversible compression of dynamic range by using the 
word 'Companding' because they are useless unless the signal is expanded after compressing. Its a 
pity the expression has been overlooked by the later techniques. 


Modern 'Compression' involves much greater reductions of bit-rate but still involves loss of some 
information. They also involve much greater delays than NICAM, e.g. 24ms for 48kHz 
MUSICAM and 36ms for 32 kHz MUSICAM. Compression is a trade off between bandwidth and 
quality, with the advantages of digital circuit stability. Analogue is a low fixed bandwidth with low 
noise to preserve dynamic range and AES has a high bandwidth able to cope with poor signal to 
noise giving comparable quality. YOU have to choose which compromise you want in terms of 
audio quality versus the cost of whatever bit-rate. 

MUSICAM, as used by Digital Audio Broadcasting (DAB) is undoubtedly a better final delivery 
system than VHF FM. There is concern though that cascading numerous codecs of various types 
does cause serious degradation of audio quality, because the technique removes almost all 
redundant information by definition. Thus two or more working independently will overlap in 
some way. | 


These compressed signals come in a variety of electrical interfaces. A MUSICAM coder may 
have X21 output, or have a terminal adapter unit built in to go direct into an ISDN connector in 
'S-bus' format. Four channels may be put into each channel of an AES interface (six or more are 
possible). A unit using this allows several different languages to accompany video on a standard 
DVTR. Six at 384kbit/s can appear оп а telecom E1 2048kbit/s interface, and a special interface 
has been defined for DAB. 
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THE AES-EBU INTERFACE 
TWO CHANNELS - EACH CARRYING: 


20 Bit of programme audio 
4 Bit of auxiliary data 

1 Bit of user data 

1 Bit of channel status data 
1 Bit for validity flag 

1 Bit for parity check 

4 Bit preamble 
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AES3-1992 SUBFRAME FORMAT 


v Validity bit 

U User data bit 

c Channel status bit 

P Parity bit 
‘AUX Auxiliary sample bus 
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AES3-1992 FRAME FORMAT 


Subframe 1 Subframe 2 


Frame 191 


Start of block 
Preamble X Subframe 1 
Preamble Y Subframe 2 


Preamble Z Subframe 1 and block start 
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AES3-1992 STATUS BIT 


Channel status information 
Channel origin data (alphanumerical) 
Channel destination data (alphanumerical) 
Local sample address code 

Time of day sample address code 
Reliabilty flags 

Cyclic redundancy check character 
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Source sampling frequency 
Encoded channel mode 

User bit management 
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Downstream routing instructions 
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Does it meet AES11 1991? 
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Is the biphase mark correct ? 
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m Retiming/Reframing inherent | 


= Input & Output format to suit sources & 


destinations ` 


, . и Normalises all signals | 
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Digital Audio 


MADI: WHAT э к: ? 


E E SE 


= An AES Standard - AES 10-1991 | 


айт 2 7 um 


> 


= 28 Stereo (56 mono) AES/EI EBU Chana qis 


" Coaxial cables up to > Som; Fibre Optic 
over Ikm:., Fe — 2 


= 48 КНг ог 44.1 kHz.. 
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- Digital Audio 


_ МАРГ USES 


PRO AUDIO 
Digital Sound desk to ymultitrack tape machine 
eg Neve Capricorn to Sony 3324 


AUDIO VISUAL | 
Theatre stage to controfr room interfacing” 
Fibre optic - no mains interference пош lighting circuits 
Single cable vs many multiways >. | 


BROADCAST MEE | 
Distribution of many audio signals between technical areas 
Distributed routing of audio within radio complexes | 


DO-SL-041 
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| Ds ital Audio = 


47 шама 
: т 


% N n 
БИЕ, ре 3 


вош. - -MADI 


= Бе ° Distributed Data МА 


заг ви Oriented transmission 
. Non ye carrier: + тамы 


| Dra 'MADI- 4- byte words f нот. AES3 


= Words must be ent кзы 


= All audio channels synchronous 


Digital Audio 
Ко pe „^^. 


м... МАСҚАР 

ыма al 

ма ~ ЛТ As и NT ET 
л E ~ - ` E “12, 1545 


MADI Channel Structure 


PE аи 


DO-SL-116 


DO-SL-115 


` 


М ae `. 


22 DIGITAL VIDEO. тээ? 
probel 


Embedded Audio о (SMPTE ; S17.100) 
Audio to AES3- 1992 рес. ications 
Default ‘standard‘{A) synchronous 20bit - 48kHz 
Standards B to J define sample rate 32kHz & 48kHz; 
synchronous. /. asynchronous; 20-or 24 bit audio 
Extended data product option for AES auxiliary data 
Recovery: requires, puffer of. 64 samples (1.285) 
Minimum of 2 "channels maximum 16 (component) 


ae ae Seg ums АА NDU 


Q S ef < + “қ М "xdi x 
: 3204 "x : vM ae. 
eto. oc eau aer, mE REL a 
oen | е r Г [ : L VI эл 5 | 
дылы , do NT уч. Ne T 2 Le, n. ЭМ 


Embedded Audio - Applications | 


= MUX / DEMUX is expensive 
- АЕ5 Matrix i is cheap _ | 


m" Breakaway is impossible 


t 3 


`. Audio > delay. (buffer must be compensated) ` 
m Audio transitions are uncontrolled 


_ = MAY be cheaper in SOME applications 
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Digital Audio 


Header : 


DO-SL-332 


Ч 


САС 


pro-bel 


ISO. 111. 72-3 “Musicam” Layer II Bitstream 


Bit Scale 
Al Hocation Factors | 
Scale 
Fact: ; | 
па] : | = Sample 


< 1152 PCM Audio Samples at 48kHz in one frame of 24ms > 
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